-
Notifications
You must be signed in to change notification settings - Fork 996
Conversation
I would simply allow users to implement Leeway in their Claims object. If you would like to provide this, I think it can be accomplished by adding a private variable to the Standard Claims object that is set in the default parser. |
I guess, I wouldn't make the struct a requirement for the interface. Often times that may just be empty. |
type Claims interface { | ||
Valid() error | ||
Valid(opts *ValidationOptions) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface would stay the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface should change IMO leeway is not part of claims it is part of validation of claims. In-fact you can even add options like how pyjwt provides like verify_iat : false
etc. to skip iat verify.
I'm sorta on the fence about this, too. It seems the most graceful way to tunnel options through to validator. We are currently missing validations on things like I think we can sign off 3.0 the way it is and continue working on this for the next update. |
Release 3.0.0
I agree with signing off on 3.0 the way it is. There is nothing stopping someone from using the Claims interface now to implement Leeway, and I don't see any reason why the addition of Leeway cannot be done in a backwards compatible way after 3.0 lands. 👍 Essentially, my issue with the way this was implemented is that you're changing the interface of the Thank you! |
I don’t think that’s the end of the world if nil is considered valid. -dave
|
Either way, 3.0 has shipped. |
Parser flag to skip claims validation during token parsing
Fixed migration guide request.ParseFromRequest example code
Use contains to buffer against differing exec time Fill in missing string format var
it allows test functions to use it in other files
Make all expired test messages consistent
Clean up unneeded language in test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps behaviour of Parse and ParseWithClaims should change to support ValidationsOptions, otherwise they will be always nil and it introduces bugs. Been debugging this for couple of hours 🙂
https://github.com/dgrijalva/jwt-go/blob/master/token.go#L89
https://github.com/dgrijalva/jwt-go/blob/master/token.go#L93
@@ -67,11 +67,16 @@ func (m MapClaims) VerifyNotBefore(cmp int64, req bool) bool { | |||
// There is no accounting for clock skew. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's now
@@ -29,14 +36,20 @@ type StandardClaims struct { | |||
// There is no accounting for clock skew. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
claims.go
Outdated
// Options passed in to Claims.Valid | ||
// Currently only supports Leeway (more coming soon) | ||
type ValidationOptions struct { | ||
Leeway int64 // allow a bit (a minute or so) of extra time to allow for clock sku |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider naming it "LeewaySeconds" to make it clear that it should be specified in seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a time.Duration
might even be better.
Is this moving forward? |
Is leeway support going to be introduced anytime soon? Not having it in place is kind of a blocker for certain scenarios. |
…ly. should we support 11 or jump to latest for new version?
… when instantly validating
# Conflicts: # claims.go # map_claims.go # parser.go
For future reference, a full working example can be found below. I struggled with this myself, and the trick I was missing, was to add the
|
leeway support has landed on the 4.0 branch |
* Dockerfile: update the Maintainer to the heketi-devel list Signed-off-by: Michael Adam <obnox@redhat.com> * docker: fix fromsource/Dockerfile build This is a bare minimum set of fixes needed to get the container image building on dockerhub again. Signed-off-by: John Mulligan <jmulligan@redhat.com> * extras: add heketi containers for the CentOS Registry Add container images for the CentOS Container Registry that is available at https://registry.centos.org/containers/ . The images are built from RPMs that are available from the CentOS Storage SIG. Currently two flavours are provided: 1. built from the CentOS Storage SIG stable/released repository 2. built from the CentOS Storage SIG testing repository Signed-off-by: Niels de Vos <ndevos@redhat.com> * db-upgrade: reduce complexity in addVolumeIdInBrickEntry() Instead looping over clusters -> volumes in the cluster-> bricks in the volume loop over volumes -> bricks in the volume This reduces the complexity, while at the same time preventing startup from failing if cluster's volume-list has a volume-id which does not have a volume entry in the DB any more. This patch is best viewn with "git show -w" (ignore white space changes). Signed-off-by: Michael Adam <obnox@redhat.com> * db-upgrade: prevent upgrade from failing with orphaned brickid in volume If a volume links back to a brick-id that does not exist any more in the db, this does not harm otherwise, so let's not have heketi refuse to start up in this situation. Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs: ignore nonexistent bricks on device in DeleteBricksWithEmptyPath Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs: ignore nonexistent device on node in DeleteBricksWithEmptyPath Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs: ignore nonexistent node in cluster in DeleteBricksWithEmptyPath Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs: ignore nonexistent block volumes in deletion of block-hosting volume Don't fail the deletion of a block-hosting volume if it references a block volume id, that does not exist in the database. Signed-off-by: Michael Adam <obnox@redhat.com> * fix glide install failure * update the free size in block info on expansion If a volume is expanded and has a block flag set to true, then we should also update the free size attribute of the volume in blockinfo. Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * add test for block free size update on expansion Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * executors: remove redundant err check conditional Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add per-brick log line in brick destroy Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: return a specific error type for delete of absent volume Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: failing to delete a deleted/missing volume is OK If a delete of a volume errors out with the does-not-exist error treat it as a success condition and continue on with life. Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: split logic for determining device and thinpool from brick Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: split out lv delete logic from brick delete func Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: split out thin pool item counting logic from delete func Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add ability to store and fetch lvm properties of brick entry Now that cloned volumes exist, we need to track the LVM properties of the brick independently of the brick ids. The alternative would be to backtrack from known values (the brick path) but this would not be reliable in the case of partial deletes. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: when creating and updating bricks record lvm properties Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: add lvm params to brick request struct Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add common function for creating brick request for brick entry Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: set lvm properties when creating brick request from brick entry Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: use specified lvm values in brick request Instead of deriving all of the lvm names like heketi has traditionally done switch to using the new explicit lvm params. Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: do not fail when unmounting an unmounted brick Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: do not fail when deleting already deleted lvs Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add tests for volume delete robustness Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add per-brick log, if brick destroy fails Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * heketi-cli: fix output of heketi-cli device resync Previously, output was "Device updated". Now it is of the form "Device <DEVICE-ID> updated". Fixes heketi#1240 Signed-off-by: Michael Adam <obnox@redhat.com> * apps: check invariants for the free size of block hosting volumes Instead of "open coding" the logic involved in modifying the block hosting volume's free space, provide a function call on the volume entry that also checks some invariants using godbc. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: treat cleanup and remove cases for block hosting vols different Unfortunately, block hosting volumes were structured different from regular volumes in where the volume was mostly configured (pre-exec vs. post-exec). With the introduction of operations this lead to some incorrect code reuse in the delete and rollback cases in that the create case does not update the block hosting volume size immediately (this is a bug too, IMO) but only in finalize. Thus trying to give space back to the block hosting volume in rollback was incorrect behavior. This somewhat hacky approach tries to fix this issue minimally by not adjusting the block hosting volume size on rollback. Future cleanups of the structure of block volume creation should try to sort out what should be set before the exec, but that's for another PR. Signed-off-by: John Mulligan <jmulligan@redhat.com> * build: add a check for mercurial/hg When running `make vendor` without `hg` in the path, the build fails with the following error: [WARN] Unable to checkout bitbucket.org/ww/goautoneg [ERROR] Update failed for bitbucket.org/ww/goautoneg: hg is not installed [ERROR] Failed to install: hg is not installed make: *** [Makefile:74: vendor] Error 1 Glide needs to fetch a repository from bitbucket.org, which is a Mercurial hosting site. The `hg` executable needs to be available to download the requires pkgs. Reported-by: Madhu Rajanna <mrajanna@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> * apps:glusterfs: set the "group gluster-block" options when manually creating block-hosting volumes Fixes heketi#1226 Signed-off-by: Michael Adam <obnox@redhat.com> * StorageSet should update total, free and used size Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * use StorageSet in device resync op Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * executors: delete path needs to check raw errors from lvm commands When this function was refactored in preparation for robust delete behavior the "friendly" error message was moved into the utility function used for counting the number of lvs in a thin pool. However, the errors from this function need to be checked to see if lvm errors out because the thin pool is missing vs. another reason. Changing the raw lvm error to a friendly error in this case is wrong. Move the friendly error back to the upper function call. Fixes issue heketi#1242 Signed-off-by: John Mulligan <jmulligan@redhat.com> * jwt: disable iat claim in jwt From dgrijalva/jwt-go#139 it is understood that if the machine where jwt token is generated and/or the machine where jwt token is verified have any clock skew then there is a possibility of getting a "Token used before issued" error. Considering that we also check for expiration with delta of 5 minutes, disabling iat claim until the patch is merged in jwt. Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * glusterfs: improve the logging in error path of DeleteBricksWithEmptyPath Signed-off-by: Michael Adam <obnox@redhat.com> * glusterfs: add an empty line between functions in db_operations.go Signed-off-by: Michael Adam <obnox@redhat.com> * tests: add test to ensure number(Hosts) equals ha count Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * apps: only provide hosts equal to ha count Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * executors: prevent (vg|pv)remove commands from prompting If there is an unexpected condition on the vg or pv when removing a device it is preferable to fail than to allow the command to prompt for input which will block heketi "forever" and possibly (due to lvm-level) locking other commands as well. Fixes heketi#1244 Signed-off-by: John Mulligan <jmulligan@redhat.com> * app: apply gluster-block group option only once Instead of applying gluster-block group option implicitly in request, apply them only once based on info.block bool. In subsequent patches, we will provide a way to control option setting and separate the option setting from info.block value. Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * app: hold blockhostingvolumeoptions in a variable Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * app: provide config option to edit blockhostingvolume options In the json config the key is block_hosting_volume_options and the env var to override it is HEKETI_BLOCK_HOSTING_VOLUME_OPTIONS. Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * go-client:client: add retryOperationDo function added new function retryOperationDo ,to continue to send request to heketi ,if there is an to many request error (429) from heketi. Adds little delay between each request to avoid overloading added code to read complete response body and close response body to reuse http connection Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * client api: make the retry do func args match standard do args Make the arguments of the retry-on-429-error do function match that of the basic do function. Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: swap in go client with retries Switch general callers of the api to use the do method with retries on 429. The reason for allowing the do function to be overridden is primarily for testing purposes. However, there is no exported method for changing it (yet). Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: update comments Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: combine retry behavior control under one options type Combine the behavioral control over the client in a single ClientOptions type. This type is able to regulate all current and future core behaviors of the heketi client. From this point on we should not need to keep adding extra New* functions, simply extend the public attributes of the ClientOptions type. ClientOptions is responsible for controlling the enabling of retries, the number of retries, and how much to delay between retries. Signed-off-by: John Mulligan <jmulligan@redhat.com> xxx: make everything configurable Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: lower retry count Lower retry count to 6 so that retries time out in under 3 minutes (with default maximums). Instead of having the client effectively wait forever this gives the client a chance to give up and return an error to the caller. Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: don't suppress real http errors Even though the client may have internally retried the request it is still better to return the original error/failing http response to the caller so that the caller can react intelligently to the error. It also preserves symmetry between the behavior of doBasic. Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: add test for retrying on server throttle responses This adds a testing middleware for faking the conditions of a server throttling client requests and then adds a test function that exercises the basic behaviors of a client getting throttled. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: move function to manage operations into new file Move functions that handle mid-level operations management into a separate source file, as operations.go is simply getting too huge. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add func for writing http error responses for operations Add a function for writing the http error responses based on the error return by the AsyncHttpOperation utility function. In the future this will be used to select appropriate http error codes based on the internal error instead of always returning 500. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a specific error condition for throttling on operations Establish an internal error condition for the system that can be returned if the server is already too busy handling existing operations. This error is converted to http error code "429 - Too Many Requests" so that client are informed this is a valid operation to retry later. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a type to track how many operations a server is handling Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a configuration option to control operations throttle limit Add a limit to define the maximum desired number of operation that will be processed at one time for the heketi server. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: configure the apps's operations counter Using the new configuration value set up an operations counter for the server's app. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add client throttling based on operations in-flight Use the OpCounter type to throttle clients by rejecting new operation when there the server is already busy processing existing operations. All operations that use the AsyncHttpOperation function will be throttled as well as node and device removal (set state when new state is failed). These are the activities that most occupy the server. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add basic unit test for op counter type Add some simple unit tests that cover the general behaviors of the operations counter type. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: set TestSmokeTest's max_inflight_operations Give the TestSmokeTest an explicit, lower than default, value for max_inflight_operations. This is intended to help exercise the throttling behavior of both the client and server on the normal paths. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a new rest endpoint for querying operations info This very simple api is meant (currently) for test code and allows the client to check how many in-flight operations and pending operations in the db exist. Eventually, this may be part of an administrative command in heketi-cli but that is currently out of scope. Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: add an api function for getting operations info This allows the go client api to query the heketi server for information about the operations. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: check operations count when setting up cluster When running the setupCluster function check the operation counts from the server as an invariant that no operations should exist on the server. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add test cases for operations throttling Add two test cases for testing the behavior of operations. These cases check that operations are rejected by the server when there are more requests incoming that allowed and that the system can return both permanent and too-many-requests type errors. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: check device & node removes work during throttle Add a test to verify that throttling works cleanly when encountering a "busy system" returning throttle errors. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: let the tests run for 2 hours The addition of a complex functional test that checks throttle behavior seems to have caused the tests to start failing in centos-ci. Increase the timeout to verify that is the cause. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add some debugging stuff Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: only consider available hosts for ha count Before the code iterated only over ha count hosts, no matter if they were available or not. This change checks for the number of available hosts being equal to ha count instead, before stopping iteration. Signed-off-by: Sven Anderson <sven@redhat.com> * tests: add test for HA with unavailable hosts Signed-off-by: Sven Anderson <sven@redhat.com> * functional tests: discover and fix another level of timeout in test Turns out there's more than one level of time out that needs to be adjusted. Fix it here and print out a specific message if the test script is timed out. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: avoid throttling test code much Some of the tests make a lot of requests at once and throttling has a huge impact on it (1min vs 4min). So we avoid doing much throttling at all in the test code. Signed-off-by: John Mulligan <jmulligan@redhat.com> * typo: Administration instead of Adminstration Administration instead of Adminstration on topics menu points * apps: shuffle ha nodes to avoid relying on same nodes When the number of nodes available is greater than the HA count we want to select a subset of these nodes for use with gluster block ha support. Previous changes to the code try to select the ha-count number of healthy nodes from this set but if the healthy node set was unchanging it would always pick the exact same nodes. This change shuffles the list of nodes that is tested for liveness so different block volumes are distributed across different nodes. Signed-off-by: Sven Anderson <sven@redhat.com> * apps: add a test for getting different ha hosts each call Verify that every time a block volume is created where the number of valid hosts is greater than the number of ha hosts that the system selects different sets of ha hosts. Signed-off-by: John Mulligan <jmulligan@redhat.com> Signed-off-by: Sven Anderson <sven@redhat.com> * apps: add proper seeding of the PRNG to main function This change adds proper seeding with the crypto/rand package, which uses /dev/urandom or similar entropy sources. It falls back to time seeding if not available. Signed-off-by: Sven Anderson <sven@redhat.com> * apps:glusterfs: reserve 1% of raw capacity in FreeSpace tracking A block-hosting volume created with raw size X can not host block files of accumulated size X because the file system uses a certain small amount of the raw size. But the FreeSpace tracking does currently not take this into account. Hence trying to create a block volume of size X with FreeSize X available on the block hosting volume will fail. This path fixes heketi to always track only 99% of the raw space of a newly created (or expanded) block hosting volume as the available FreeSize. Thus the announced free space will always be able to host content of that size. Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs: remove a warning message This was a leftover message for debugging. Fixes: heketi#1292 Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs Deallocate space on device if volume creation fails If volume creation fails for some reason, as part of volume creation cleanup, the space allocated for the bricks need to be freed on the device, upon the removal of the bricks from the volume Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps: added test case to check device space reclaimed added test case to check device space is reclaimed if volume creation fails. Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps: added test case to check device space reclaimed added test case to check device space is reclaimed if block volume creation fails. Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps:glusterfs update volume hosts info during node deletion during the node delete in heketi, we remove node from heketi management,volume info such as Hosts (which may contain the deleting node hostname). and we need to update the backup-volfile-servers and volume mountpoint (this also may contain the deleting node hostname) if it's a block hosting volume, we need to fetch all the block volumes belongs to it, and update the hosts in those volumes, so that next block volume creation fetches the new hosts from the heketi cluster. Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps: update bounds checking unit tests for volume expand The test for volume size too small, wasn't actually testing the volume expand api. There was no test for number too high to fit in the struct. This change resolves these problems. Signed-off-by: John Mulligan <jmulligan@redhat.com> * keep error msg to single line Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * middleware: implement new claims type and validation function Due to inability of jwt library to handle clock skew, we implement a derived type of claims that does validation of "iat" claim with leeway. As a side effect you see the following changes: 1. required_claims is embedded in validation function and missing claims error is replaced with unauthorized 2. As claims type has changed from map to struct, claims are now accessed as members and not indexed. The leeway for "iat" is set to 120 seconds by default. It can only be changed through env variable: HEKETI_JWT_IAT_LEEWAY_SECONDS. Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * tests: fix comments in other tests Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * apps: create a status field for pending operation entries This status field will be used by the heketi server to track the status of the various pending operations. Currently the only states are "new" (created by this server instance) and "stale" (created by some other server instance and not being worked on). Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add function to mark all pending op entries stale Signed-off-by: John Mulligan <jmulligan@redhat.com> * server: add infrastructure to mark pending ops stale on server restart Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add test of marking pending ops stale on "restart" Add a test case that fakes the server restart by calling reset function explicitly and checking that pending ops in the db become marked as "stale". Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a function to summarize counts of pending operation states Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: extend api to report on operation counts by status Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: reject unwanted potential parallel BHV creates When a block volume create operation is started check that there are no pending block hosting volumes being created. If there are, reject the request with the too-many-requests http code. This avoids creating multiple block hosting volumes when only one should be created (assuming it can fist the subsequent request). Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a test for rejecting parallel BHV creation. Add a test that verifies that parallel block-hosting-volume creates will be rejected. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a test that bhv blocking correctly ignores some items Add a test that the rejection of creating new block-hosting-volumes does not consider stale pending block hosting volumes or pending non-block hosting volumes as reasons to reject a block volume request that needs a new block-hosting-volume. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: change Modify(Free|Reserved)Size function to return error Change the VolumeEntry's ModifyFreeSize and ModifyReservedSize functions to return errors instead of panicing. Also plumb the error handling up to the callers of these functions. This will prevent wrong values from being saved to the db without killing the server. The admin of the server can then stop the server at a later time and repair the db as needed. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add test for free size error checking Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add db upgrade func to correct bad block hosting free sizes Add a function that runs during db upgrade to find and try to correct any bad block hosting volume free size counts. This function will skip updating the free size to something outside the valid values, so if the size is bad but the correction makes no sense, it will leave things as is. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add tests for upgrade block-hosting free-size fix func Signed-off-by: John Mulligan <jmulligan@redhat.com> * main: print the error in case starting the app failed In case initialization of the app fails, the message ERROR: Unable to start application is not very helpful. There might be an actual error that can be displayed. If there is one, it should be logged. Signed-off-by: Niels de Vos <ndevos@redhat.com> * apps: don't waste entropy for seeding Seed values that have the same remainder when divided by 2^31-1 generate the same pseudo-random sequence. So there is no need waste entropy to generate a 64 bit random number. * apps: remove effectless UTC() call, b/c UnixNano() ignores timezone. * heketi cli: add "server operations info" command The command `heketi-cli server operations info` will list a brief summary that counts the operations in various states. Example: ``` $ ./heketi-cli server operations info Operation Counts: Total: 3 In-Flight: 0 New: 0 Stale: 3 ``` Signed-off-by: John Mulligan <jmulligan@redhat.com> * utils: use logger to capture ssh key setup errors Instead of just blatting some errors out to stdout (not stderr) we instead we use the nice logger that was already in the package. Signed-off-by: John Mulligan <jmulligan@redhat.com> * function test: added test cases for block volumes added TestBlockVolume to test functional tests cases for block volume Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * api: support colon chars in device paths Adjust regex to allow device paths with colons. Colons may appear in paths such as "/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:1:0" and thus are valid use cases. Fixes heketi#1246 Signed-off-by: John Mulligan <jmulligan@redhat.com> * go client: test that device names can contain colons Adjust the devices names used in the path to check that that the server api accepts names containing colons. Signed-off-by: John Mulligan <jmulligan@redhat.com> * docker: use a non-blocking flock in heketi-start.sh Signed-off-by: John Mulligan <jmulligan@redhat.com> * api: add api types needed for server admin states Add new types to the api that will be used to get and set a heketi server's adminstrative state. Signed-off-by: John Mulligan <jmulligan@redhat.com> * server: new package to manage server's administrative state Create the server/admin package to track and control a heketi server's administrative state. Currently all the state does is control what clients are allow to made modifying requests: * normal: all clients * read-only: no clients * local-only: only localhost Signed-off-by: John Mulligan <jmulligan@redhat.com> * main: add server admin state to middleware & routes Signed-off-by: John Mulligan <jmulligan@redhat.com> * server: allow server admin state to be reset by signal In a pinch it may be needed (also for testing) to reset the server's state back to normal even after it's been set to read only. Allow SIGUSR2 to reset the server's admin state to normal. Signed-off-by: John Mulligan <jmulligan@redhat.com> * client: add go client api support for admin status get & set Signed-off-by: John Mulligan <jmulligan@redhat.com> * client: add tests for admin status in go client api Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add tests for heketi server admin modes Add a suite of functional tests that test various behaviors of the heketi server's administrative modes. This includes a test of sending the server a SIGUSR2 signal to reset the state to default. Signed-off-by: John Mulligan <jmulligan@redhat.com> * heketi cli: add commands to get and set server admin modes The command `heketi-cli server mode get` will print out the server's current administrative mode. The command `heketi-cli server mode set VALUE` will set the server's administrative mode. Supported modes are "normal", "local-client", and "read-only". Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: format arbiter bricks differently Add support to the brick request to tell the executor that a brick file system should be formatted in one fashion or another. Define a normal and arbiter formatting type and use the arbiter formatting type to create a file system that allows for more inodes. Pair-Programmed-With: John Mulligan <jmulligan@redhat.com> Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add new brick sub-type field to determine brick formatting Add a new brick entry field for the brick's "sub-type", which is a semi-opaque value that can be used to determine any special formatting for that brick. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: support assigning brick subtype in placers Placers must determine what the brick's subtype is because it is the role of the placer to carve out the bricks from the underlying storage and they have the knowledge needed to make the determination of the subtype. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: use subtype to determine the brick formatting type When the brick subtype is arbiter specify an arbiter formatting for the executor. If the subtype is normal use normal formatting. If the subtype is unknown and we're creating a brick request to be used for a new brick, panic to detect misuse of the api. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional test: add checks for how arbiter brick was formatted Add checks to some of the arbiter functional tests to verify that the inode percentage was set differently on the arbiter brick than on the other bricks. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a db feature flag for sub type field on brick entries Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps:glusterfs: reserve 2% of raw capacity in FreeSpace tracking For larger block volumes, reserving 1% of raw capacity is not enough, as the block volume delete operation in gluster block also requires some memory to store its metadata during deletion of block volume. If we reserve 2% of raw capacity, we can create and delete block volumes of larger size. tested with 10GB, 50GB, 100GB, 200GB, 500GB and 1TB) of block volumes. Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * functional/smoke test: an updated test case for block volumes after reserving 2% of raw capacity, we should be able to create and delete block volumes of larger sizes. updated block hosting volume size to 200 in heketi.json added different scenarios for block volume creation Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps:glusterfs improve log message if requested size is greater then the free size log both requested block size and block hosting volume free size. Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps:glusterfs improve error message if the requested block is is greater then the reserverRaw size, return error message that contains the reservedRaw size. Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps:glusterfs fix logger format for go 1.11 DeviceInfo.Id is actually a string, which makes the %d formatter an error with upcoming go 1.11. Use %v for consistency. * apps: keep passing on block volume error conditions in rollback By failing to pass on the error conditions trying to remove a failed block volume creation we hide the error from the operations framework which causes it to remove the pending operation entry and we lose valuable debugging information and the ability to possibly automatically revert the operation later. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: test error hit during block volume create rollback Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: change block volumes to allocate size in Build step Previously, block volumes were only reducing the free space from the Block Hosting volumes in the Finalize step of the operation. This was incorrect behavior because it meant that Heketi could allow two concurrent block volume creations to essentially occupy the same free space. Then one of the operations would fail during the Exec phase, with a no-space error in gluster-block. This change moves the logic of reserving space to the Build phase as well as changing the Rollback behavior to expect the space to be taken on the BHV and return it in the clean up method. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a test to demonstrate that hosting volumes can be overused This test checks that when two concurrent block volume requests are received and these two block volumes would occupy the same space on the block hosting volume one of the requests is rejected. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add a test for concurrent block volume creates This test checks that when block volume requests are received concurrently, that all requests are satisfied even though the initial block hosting volume can not hold all the requests. This test will fail if the Build phase of the operation allows multiple block volumes for the same space on one block hosting volume. In that case the gluster-block command(s) fail with an out of space error. Signed-off-by: John Mulligan <jmulligan@redhat.com> * stop testing go 1.7.5 as part of travis ci Go version 1.7.5 is older than any version of go on platforms we normally ship on and it should not be needed to regularly test on this version any more. Signed-off-by: John Mulligan <jmulligan@redhat.com> * Revert "stop testing go 1.7.5 as part of travis ci" This reverts commit ecf2df9. * executors: prevent pvcreate/vgcreate/lvcreate from displaying an interactive prompt In certain circumstances LVM commands can prompt the user for a confirmation. Users will not be able to answer the prompt in case Heketi executes the commands. By passing the `-qq` parameter to the LVM commands, the default answer for the prompt will be no and the command exits with a failure. Fixes: gluster/gluster-kubernetes#497 Reported-bu: Ulf Lilleengen <lulf@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> * docker: load an initial topology if HEKETI_TOPOLOGY_FILE is set By providing the environment variable HEKETI_TOPOLOGY_FILE with a filename that is available in the container, the database gets populated with this topology file. In Kubernetes environments the HEKETI_TOPOLOGY_FILE can be provided as a ConfigMap or Secret and mounted as a volume in the pods specification. With this change it is now possible to deploy Heketi with a .yaml file that contains the topology.json and the Heketi Deployment. Signed-off-by: Niels de Vos <ndevos@redhat.com> * api: define a new type to restrict usage of block hosting volumes A new type and corresponding property on the volume's BlockInfo struct will be used to restrict if new block volumes can be placed on a block hosting volume. This can be used by admins for migrating off of one block hosting volume. It will also be used by the system to mark "sketchy" block hosting volumes post-upgrade. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: deny placing a block volume on a restricted BHV If the value is anything other than unrestricted the system will reject putting a new block volume on the BHV. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a test case to check block vols on restricted BHVs Add a test case to check that a new block volume will not be placed on a BHV that is restricted. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a new operation for changing the BHV's restriction state A restriction can be placed on a BHV immediately but a restriction can only be removed if certain conditions are met. Thus the operation allows the user to set the Locked restriction in the Build method of the operation, but clearing a restriction is only done in Finalize. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add tests for new BHV restriction set operation Signed-off-by: John Mulligan <jmulligan@redhat.com> * api: add a request type and validation for setting a restriction Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a new endpoint for setting a restriction on a BHV This endpoint allows the admin to set the "" (unrestricted) or "locked" values on a BHV, via the recently added operation. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add an upgrade function that tries to update reserved size on BHVs Add a new upgrade function that attempts to update the reserved size on all block hosting volumes in the db. If the expected amount of reserved size can't be taken the system will set the "locked-by-update" restriction on the BHV. This restriction can by cleared by the admin, but only if certain preconditions are met (see change that adds operation). If the values seen on BHVs are not sane the restriction will also be set. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add tests for upgrade func that updates reserved sizes Signed-off-by: John Mulligan <jmulligan@redhat.com> * python api: add python client apis for block volumes In order to write some updgrade tests (which is in python) add apis to access block volumes. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: test volumes & block volumes post upgrade from 6x Add a new db snapshot from 6x with both volumes and block volumes and test that they can be created and deleted post-upgrade. Signed-off-by: John Mulligan <jmulligan@redhat.com> * client: add client library for setting block restriction Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * client: cli changes for BHV lockdown Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * api: tweak output representation of no restriction Change the output for the unrestricted state (internally "", the empty string) to say "(none)" for no restrictions. By convention the parens indicate the value is virtual. Signed-off-by: John Mulligan <jmulligan@redhat.com> * api: add test for setting new block volume restriction Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: use a specific no-exist err for block volume delete Return a specific error type from the executor if the gluster-block command fails but reports that the volume we are attempting to delete does not exist. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: don't fail if a volume doesn't exist for a block vol delete If the executor layer reports that the volume we are deleting doesn't exist, do not fail the function. The end result is that the volume we were trying to delete is gone in both conditions (robust delete for block volumes). Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a test for insufficient hosts on HA count = 3 Add a test case that checks that the system fails as expected if there are too few nodes working when an ha count of 3 is given. Also checks the recent change to teat a does-not-exist error from gluster block as a good exit condition for block volume delete. This tests that when the exec step fails due to ha count issues, that the overall operation is cleaned up. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add UpdateVolumeInfoComplete() Function for filtering out the pending block volumes from the volume info of block hosting volumes. Signed-off-by: Michael Adam <obnox@redhat.com> * apps: add a unit test for UpdateVolumeInfoComplete() Signed-off-by: Michael Adam <obnox@redhat.com> * apps: expose only completed block volumes through VolumeInfo Originally, also pending block volumes were listed in the volume info for a block hosting volume. This change filters those pending volumes out. Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs: fix a comment typo for ListCompleteVolumes Signed-off-by: Michael Adam <obnox@redhat.com> * executors: fix method used to extract known errs on block vol delete Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: duplicate output on stderr for gluster-block delete Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add a test to check that deleting deleted BVs succeeds Adds a test case to verify that a previously deleted block volume at the gluster-block layer can subsequently be deleted from heketi. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add a test for error conditions on block delete Add a test case that tries deleting block volumes during a bad state in order to verify that heketi does not lose a block volume due to bad error handling. Signed-off-by: John Mulligan <jmulligan@redhat.com> * kubeexec: match the stderr output of kubeexec to that of sshexec Kubeexec was adding extra text to stderr before sending it back to caller. In cases where the stderr text has to be parsed it was leading to parse errors. It is sufficient to log the details of pod where the commands were executed and such. Let the error string be passed as-is to the caller. Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * Fix error message in glusterfs.DbCreate() * Update the Dockerfile for release 8 Signed-off-by: John Mulligan <jmulligan@redhat.com> Co-authored-by: Michael Adam <obnox@redhat.com> Co-authored-by: John Mulligan <jmulligan@redhat.com> Co-authored-by: Niels de Vos <ndevos@redhat.com> Co-authored-by: zhengjiajin <zhengjiajin@caicloud.io> Co-authored-by: Raghavendra Talur <rtalur@redhat.com> Co-authored-by: Madhu Rajanna <mrajanna@redhat.com> Co-authored-by: Sven Anderson <sven@redhat.com> Co-authored-by: bloublou <bloublou2014@gmail.com> Co-authored-by: Yann Hodique <yhodique@google.com> Co-authored-by: Michael Adam <obnox@samba.org>
* Dockerfile: update the Maintainer to the heketi-devel list Signed-off-by: Michael Adam <obnox@redhat.com> * docker: fix fromsource/Dockerfile build This is a bare minimum set of fixes needed to get the container image building on dockerhub again. Signed-off-by: John Mulligan <jmulligan@redhat.com> * extras: add heketi containers for the CentOS Registry Add container images for the CentOS Container Registry that is available at https://registry.centos.org/containers/ . The images are built from RPMs that are available from the CentOS Storage SIG. Currently two flavours are provided: 1. built from the CentOS Storage SIG stable/released repository 2. built from the CentOS Storage SIG testing repository Signed-off-by: Niels de Vos <ndevos@redhat.com> * db-upgrade: reduce complexity in addVolumeIdInBrickEntry() Instead looping over clusters -> volumes in the cluster-> bricks in the volume loop over volumes -> bricks in the volume This reduces the complexity, while at the same time preventing startup from failing if cluster's volume-list has a volume-id which does not have a volume entry in the DB any more. This patch is best viewn with "git show -w" (ignore white space changes). Signed-off-by: Michael Adam <obnox@redhat.com> * db-upgrade: prevent upgrade from failing with orphaned brickid in volume If a volume links back to a brick-id that does not exist any more in the db, this does not harm otherwise, so let's not have heketi refuse to start up in this situation. Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs: ignore nonexistent bricks on device in DeleteBricksWithEmptyPath Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs: ignore nonexistent device on node in DeleteBricksWithEmptyPath Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs: ignore nonexistent node in cluster in DeleteBricksWithEmptyPath Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs: ignore nonexistent block volumes in deletion of block-hosting volume Don't fail the deletion of a block-hosting volume if it references a block volume id, that does not exist in the database. Signed-off-by: Michael Adam <obnox@redhat.com> * fix glide install failure * update the free size in block info on expansion If a volume is expanded and has a block flag set to true, then we should also update the free size attribute of the volume in blockinfo. Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * add test for block free size update on expansion Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * executors: remove redundant err check conditional Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add per-brick log line in brick destroy Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: return a specific error type for delete of absent volume Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: failing to delete a deleted/missing volume is OK If a delete of a volume errors out with the does-not-exist error treat it as a success condition and continue on with life. Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: split logic for determining device and thinpool from brick Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: split out lv delete logic from brick delete func Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: split out thin pool item counting logic from delete func Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add ability to store and fetch lvm properties of brick entry Now that cloned volumes exist, we need to track the LVM properties of the brick independently of the brick ids. The alternative would be to backtrack from known values (the brick path) but this would not be reliable in the case of partial deletes. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: when creating and updating bricks record lvm properties Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: add lvm params to brick request struct Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add common function for creating brick request for brick entry Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: set lvm properties when creating brick request from brick entry Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: use specified lvm values in brick request Instead of deriving all of the lvm names like heketi has traditionally done switch to using the new explicit lvm params. Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: do not fail when unmounting an unmounted brick Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: do not fail when deleting already deleted lvs Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add tests for volume delete robustness Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add per-brick log, if brick destroy fails Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * heketi-cli: fix output of heketi-cli device resync Previously, output was "Device updated". Now it is of the form "Device <DEVICE-ID> updated". Fixes heketi#1240 Signed-off-by: Michael Adam <obnox@redhat.com> * apps: check invariants for the free size of block hosting volumes Instead of "open coding" the logic involved in modifying the block hosting volume's free space, provide a function call on the volume entry that also checks some invariants using godbc. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: treat cleanup and remove cases for block hosting vols different Unfortunately, block hosting volumes were structured different from regular volumes in where the volume was mostly configured (pre-exec vs. post-exec). With the introduction of operations this lead to some incorrect code reuse in the delete and rollback cases in that the create case does not update the block hosting volume size immediately (this is a bug too, IMO) but only in finalize. Thus trying to give space back to the block hosting volume in rollback was incorrect behavior. This somewhat hacky approach tries to fix this issue minimally by not adjusting the block hosting volume size on rollback. Future cleanups of the structure of block volume creation should try to sort out what should be set before the exec, but that's for another PR. Signed-off-by: John Mulligan <jmulligan@redhat.com> * build: add a check for mercurial/hg When running `make vendor` without `hg` in the path, the build fails with the following error: [WARN] Unable to checkout bitbucket.org/ww/goautoneg [ERROR] Update failed for bitbucket.org/ww/goautoneg: hg is not installed [ERROR] Failed to install: hg is not installed make: *** [Makefile:74: vendor] Error 1 Glide needs to fetch a repository from bitbucket.org, which is a Mercurial hosting site. The `hg` executable needs to be available to download the requires pkgs. Reported-by: Madhu Rajanna <mrajanna@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> * apps:glusterfs: set the "group gluster-block" options when manually creating block-hosting volumes Fixes heketi#1226 Signed-off-by: Michael Adam <obnox@redhat.com> * StorageSet should update total, free and used size Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * use StorageSet in device resync op Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * executors: delete path needs to check raw errors from lvm commands When this function was refactored in preparation for robust delete behavior the "friendly" error message was moved into the utility function used for counting the number of lvs in a thin pool. However, the errors from this function need to be checked to see if lvm errors out because the thin pool is missing vs. another reason. Changing the raw lvm error to a friendly error in this case is wrong. Move the friendly error back to the upper function call. Fixes issue heketi#1242 Signed-off-by: John Mulligan <jmulligan@redhat.com> * jwt: disable iat claim in jwt From dgrijalva/jwt-go#139 it is understood that if the machine where jwt token is generated and/or the machine where jwt token is verified have any clock skew then there is a possibility of getting a "Token used before issued" error. Considering that we also check for expiration with delta of 5 minutes, disabling iat claim until the patch is merged in jwt. Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * glusterfs: improve the logging in error path of DeleteBricksWithEmptyPath Signed-off-by: Michael Adam <obnox@redhat.com> * glusterfs: add an empty line between functions in db_operations.go Signed-off-by: Michael Adam <obnox@redhat.com> * tests: add test to ensure number(Hosts) equals ha count Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * apps: only provide hosts equal to ha count Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * executors: prevent (vg|pv)remove commands from prompting If there is an unexpected condition on the vg or pv when removing a device it is preferable to fail than to allow the command to prompt for input which will block heketi "forever" and possibly (due to lvm-level) locking other commands as well. Fixes heketi#1244 Signed-off-by: John Mulligan <jmulligan@redhat.com> * app: apply gluster-block group option only once Instead of applying gluster-block group option implicitly in request, apply them only once based on info.block bool. In subsequent patches, we will provide a way to control option setting and separate the option setting from info.block value. Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * app: hold blockhostingvolumeoptions in a variable Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * app: provide config option to edit blockhostingvolume options In the json config the key is block_hosting_volume_options and the env var to override it is HEKETI_BLOCK_HOSTING_VOLUME_OPTIONS. Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * go-client:client: add retryOperationDo function added new function retryOperationDo ,to continue to send request to heketi ,if there is an to many request error (429) from heketi. Adds little delay between each request to avoid overloading added code to read complete response body and close response body to reuse http connection Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * client api: make the retry do func args match standard do args Make the arguments of the retry-on-429-error do function match that of the basic do function. Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: swap in go client with retries Switch general callers of the api to use the do method with retries on 429. The reason for allowing the do function to be overridden is primarily for testing purposes. However, there is no exported method for changing it (yet). Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: update comments Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: combine retry behavior control under one options type Combine the behavioral control over the client in a single ClientOptions type. This type is able to regulate all current and future core behaviors of the heketi client. From this point on we should not need to keep adding extra New* functions, simply extend the public attributes of the ClientOptions type. ClientOptions is responsible for controlling the enabling of retries, the number of retries, and how much to delay between retries. Signed-off-by: John Mulligan <jmulligan@redhat.com> xxx: make everything configurable Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: lower retry count Lower retry count to 6 so that retries time out in under 3 minutes (with default maximums). Instead of having the client effectively wait forever this gives the client a chance to give up and return an error to the caller. Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: don't suppress real http errors Even though the client may have internally retried the request it is still better to return the original error/failing http response to the caller so that the caller can react intelligently to the error. It also preserves symmetry between the behavior of doBasic. Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: add test for retrying on server throttle responses This adds a testing middleware for faking the conditions of a server throttling client requests and then adds a test function that exercises the basic behaviors of a client getting throttled. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: move function to manage operations into new file Move functions that handle mid-level operations management into a separate source file, as operations.go is simply getting too huge. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add func for writing http error responses for operations Add a function for writing the http error responses based on the error return by the AsyncHttpOperation utility function. In the future this will be used to select appropriate http error codes based on the internal error instead of always returning 500. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a specific error condition for throttling on operations Establish an internal error condition for the system that can be returned if the server is already too busy handling existing operations. This error is converted to http error code "429 - Too Many Requests" so that client are informed this is a valid operation to retry later. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a type to track how many operations a server is handling Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a configuration option to control operations throttle limit Add a limit to define the maximum desired number of operation that will be processed at one time for the heketi server. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: configure the apps's operations counter Using the new configuration value set up an operations counter for the server's app. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add client throttling based on operations in-flight Use the OpCounter type to throttle clients by rejecting new operation when there the server is already busy processing existing operations. All operations that use the AsyncHttpOperation function will be throttled as well as node and device removal (set state when new state is failed). These are the activities that most occupy the server. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add basic unit test for op counter type Add some simple unit tests that cover the general behaviors of the operations counter type. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: set TestSmokeTest's max_inflight_operations Give the TestSmokeTest an explicit, lower than default, value for max_inflight_operations. This is intended to help exercise the throttling behavior of both the client and server on the normal paths. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a new rest endpoint for querying operations info This very simple api is meant (currently) for test code and allows the client to check how many in-flight operations and pending operations in the db exist. Eventually, this may be part of an administrative command in heketi-cli but that is currently out of scope. Signed-off-by: John Mulligan <jmulligan@redhat.com> * client api: add an api function for getting operations info This allows the go client api to query the heketi server for information about the operations. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: check operations count when setting up cluster When running the setupCluster function check the operation counts from the server as an invariant that no operations should exist on the server. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add test cases for operations throttling Add two test cases for testing the behavior of operations. These cases check that operations are rejected by the server when there are more requests incoming that allowed and that the system can return both permanent and too-many-requests type errors. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: check device & node removes work during throttle Add a test to verify that throttling works cleanly when encountering a "busy system" returning throttle errors. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: let the tests run for 2 hours The addition of a complex functional test that checks throttle behavior seems to have caused the tests to start failing in centos-ci. Increase the timeout to verify that is the cause. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add some debugging stuff Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: only consider available hosts for ha count Before the code iterated only over ha count hosts, no matter if they were available or not. This change checks for the number of available hosts being equal to ha count instead, before stopping iteration. Signed-off-by: Sven Anderson <sven@redhat.com> * tests: add test for HA with unavailable hosts Signed-off-by: Sven Anderson <sven@redhat.com> * functional tests: discover and fix another level of timeout in test Turns out there's more than one level of time out that needs to be adjusted. Fix it here and print out a specific message if the test script is timed out. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: avoid throttling test code much Some of the tests make a lot of requests at once and throttling has a huge impact on it (1min vs 4min). So we avoid doing much throttling at all in the test code. Signed-off-by: John Mulligan <jmulligan@redhat.com> * typo: Administration instead of Adminstration Administration instead of Adminstration on topics menu points * apps: shuffle ha nodes to avoid relying on same nodes When the number of nodes available is greater than the HA count we want to select a subset of these nodes for use with gluster block ha support. Previous changes to the code try to select the ha-count number of healthy nodes from this set but if the healthy node set was unchanging it would always pick the exact same nodes. This change shuffles the list of nodes that is tested for liveness so different block volumes are distributed across different nodes. Signed-off-by: Sven Anderson <sven@redhat.com> * apps: add a test for getting different ha hosts each call Verify that every time a block volume is created where the number of valid hosts is greater than the number of ha hosts that the system selects different sets of ha hosts. Signed-off-by: John Mulligan <jmulligan@redhat.com> Signed-off-by: Sven Anderson <sven@redhat.com> * apps: add proper seeding of the PRNG to main function This change adds proper seeding with the crypto/rand package, which uses /dev/urandom or similar entropy sources. It falls back to time seeding if not available. Signed-off-by: Sven Anderson <sven@redhat.com> * apps:glusterfs: reserve 1% of raw capacity in FreeSpace tracking A block-hosting volume created with raw size X can not host block files of accumulated size X because the file system uses a certain small amount of the raw size. But the FreeSpace tracking does currently not take this into account. Hence trying to create a block volume of size X with FreeSize X available on the block hosting volume will fail. This path fixes heketi to always track only 99% of the raw space of a newly created (or expanded) block hosting volume as the available FreeSize. Thus the announced free space will always be able to host content of that size. Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs: remove a warning message This was a leftover message for debugging. Fixes: heketi#1292 Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs Deallocate space on device if volume creation fails If volume creation fails for some reason, as part of volume creation cleanup, the space allocated for the bricks need to be freed on the device, upon the removal of the bricks from the volume Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps: added test case to check device space reclaimed added test case to check device space is reclaimed if volume creation fails. Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps: added test case to check device space reclaimed added test case to check device space is reclaimed if block volume creation fails. Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps:glusterfs update volume hosts info during node deletion during the node delete in heketi, we remove node from heketi management,volume info such as Hosts (which may contain the deleting node hostname). and we need to update the backup-volfile-servers and volume mountpoint (this also may contain the deleting node hostname) if it's a block hosting volume, we need to fetch all the block volumes belongs to it, and update the hosts in those volumes, so that next block volume creation fetches the new hosts from the heketi cluster. Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps: update bounds checking unit tests for volume expand The test for volume size too small, wasn't actually testing the volume expand api. There was no test for number too high to fit in the struct. This change resolves these problems. Signed-off-by: John Mulligan <jmulligan@redhat.com> * keep error msg to single line Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * middleware: implement new claims type and validation function Due to inability of jwt library to handle clock skew, we implement a derived type of claims that does validation of "iat" claim with leeway. As a side effect you see the following changes: 1. required_claims is embedded in validation function and missing claims error is replaced with unauthorized 2. As claims type has changed from map to struct, claims are now accessed as members and not indexed. The leeway for "iat" is set to 120 seconds by default. It can only be changed through env variable: HEKETI_JWT_IAT_LEEWAY_SECONDS. Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * tests: fix comments in other tests Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * apps: create a status field for pending operation entries This status field will be used by the heketi server to track the status of the various pending operations. Currently the only states are "new" (created by this server instance) and "stale" (created by some other server instance and not being worked on). Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add function to mark all pending op entries stale Signed-off-by: John Mulligan <jmulligan@redhat.com> * server: add infrastructure to mark pending ops stale on server restart Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add test of marking pending ops stale on "restart" Add a test case that fakes the server restart by calling reset function explicitly and checking that pending ops in the db become marked as "stale". Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a function to summarize counts of pending operation states Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: extend api to report on operation counts by status Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: reject unwanted potential parallel BHV creates When a block volume create operation is started check that there are no pending block hosting volumes being created. If there are, reject the request with the too-many-requests http code. This avoids creating multiple block hosting volumes when only one should be created (assuming it can fist the subsequent request). Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a test for rejecting parallel BHV creation. Add a test that verifies that parallel block-hosting-volume creates will be rejected. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a test that bhv blocking correctly ignores some items Add a test that the rejection of creating new block-hosting-volumes does not consider stale pending block hosting volumes or pending non-block hosting volumes as reasons to reject a block volume request that needs a new block-hosting-volume. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: change Modify(Free|Reserved)Size function to return error Change the VolumeEntry's ModifyFreeSize and ModifyReservedSize functions to return errors instead of panicing. Also plumb the error handling up to the callers of these functions. This will prevent wrong values from being saved to the db without killing the server. The admin of the server can then stop the server at a later time and repair the db as needed. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add test for free size error checking Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add db upgrade func to correct bad block hosting free sizes Add a function that runs during db upgrade to find and try to correct any bad block hosting volume free size counts. This function will skip updating the free size to something outside the valid values, so if the size is bad but the correction makes no sense, it will leave things as is. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add tests for upgrade block-hosting free-size fix func Signed-off-by: John Mulligan <jmulligan@redhat.com> * main: print the error in case starting the app failed In case initialization of the app fails, the message ERROR: Unable to start application is not very helpful. There might be an actual error that can be displayed. If there is one, it should be logged. Signed-off-by: Niels de Vos <ndevos@redhat.com> * apps: don't waste entropy for seeding Seed values that have the same remainder when divided by 2^31-1 generate the same pseudo-random sequence. So there is no need waste entropy to generate a 64 bit random number. * apps: remove effectless UTC() call, b/c UnixNano() ignores timezone. * heketi cli: add "server operations info" command The command `heketi-cli server operations info` will list a brief summary that counts the operations in various states. Example: ``` $ ./heketi-cli server operations info Operation Counts: Total: 3 In-Flight: 0 New: 0 Stale: 3 ``` Signed-off-by: John Mulligan <jmulligan@redhat.com> * utils: use logger to capture ssh key setup errors Instead of just blatting some errors out to stdout (not stderr) we instead we use the nice logger that was already in the package. Signed-off-by: John Mulligan <jmulligan@redhat.com> * function test: added test cases for block volumes added TestBlockVolume to test functional tests cases for block volume Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * api: support colon chars in device paths Adjust regex to allow device paths with colons. Colons may appear in paths such as "/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:1:0" and thus are valid use cases. Fixes heketi#1246 Signed-off-by: John Mulligan <jmulligan@redhat.com> * go client: test that device names can contain colons Adjust the devices names used in the path to check that that the server api accepts names containing colons. Signed-off-by: John Mulligan <jmulligan@redhat.com> * docker: use a non-blocking flock in heketi-start.sh Signed-off-by: John Mulligan <jmulligan@redhat.com> * api: add api types needed for server admin states Add new types to the api that will be used to get and set a heketi server's adminstrative state. Signed-off-by: John Mulligan <jmulligan@redhat.com> * server: new package to manage server's administrative state Create the server/admin package to track and control a heketi server's administrative state. Currently all the state does is control what clients are allow to made modifying requests: * normal: all clients * read-only: no clients * local-only: only localhost Signed-off-by: John Mulligan <jmulligan@redhat.com> * main: add server admin state to middleware & routes Signed-off-by: John Mulligan <jmulligan@redhat.com> * server: allow server admin state to be reset by signal In a pinch it may be needed (also for testing) to reset the server's state back to normal even after it's been set to read only. Allow SIGUSR2 to reset the server's admin state to normal. Signed-off-by: John Mulligan <jmulligan@redhat.com> * client: add go client api support for admin status get & set Signed-off-by: John Mulligan <jmulligan@redhat.com> * client: add tests for admin status in go client api Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add tests for heketi server admin modes Add a suite of functional tests that test various behaviors of the heketi server's administrative modes. This includes a test of sending the server a SIGUSR2 signal to reset the state to default. Signed-off-by: John Mulligan <jmulligan@redhat.com> * heketi cli: add commands to get and set server admin modes The command `heketi-cli server mode get` will print out the server's current administrative mode. The command `heketi-cli server mode set VALUE` will set the server's administrative mode. Supported modes are "normal", "local-client", and "read-only". Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: format arbiter bricks differently Add support to the brick request to tell the executor that a brick file system should be formatted in one fashion or another. Define a normal and arbiter formatting type and use the arbiter formatting type to create a file system that allows for more inodes. Pair-Programmed-With: John Mulligan <jmulligan@redhat.com> Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add new brick sub-type field to determine brick formatting Add a new brick entry field for the brick's "sub-type", which is a semi-opaque value that can be used to determine any special formatting for that brick. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: support assigning brick subtype in placers Placers must determine what the brick's subtype is because it is the role of the placer to carve out the bricks from the underlying storage and they have the knowledge needed to make the determination of the subtype. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: use subtype to determine the brick formatting type When the brick subtype is arbiter specify an arbiter formatting for the executor. If the subtype is normal use normal formatting. If the subtype is unknown and we're creating a brick request to be used for a new brick, panic to detect misuse of the api. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional test: add checks for how arbiter brick was formatted Add checks to some of the arbiter functional tests to verify that the inode percentage was set differently on the arbiter brick than on the other bricks. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a db feature flag for sub type field on brick entries Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps:glusterfs: reserve 2% of raw capacity in FreeSpace tracking For larger block volumes, reserving 1% of raw capacity is not enough, as the block volume delete operation in gluster block also requires some memory to store its metadata during deletion of block volume. If we reserve 2% of raw capacity, we can create and delete block volumes of larger size. tested with 10GB, 50GB, 100GB, 200GB, 500GB and 1TB) of block volumes. Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * functional/smoke test: an updated test case for block volumes after reserving 2% of raw capacity, we should be able to create and delete block volumes of larger sizes. updated block hosting volume size to 200 in heketi.json added different scenarios for block volume creation Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps:glusterfs improve log message if requested size is greater then the free size log both requested block size and block hosting volume free size. Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps:glusterfs improve error message if the requested block is is greater then the reserverRaw size, return error message that contains the reservedRaw size. Signed-off-by: Madhu Rajanna <mrajanna@redhat.com> * apps:glusterfs fix logger format for go 1.11 DeviceInfo.Id is actually a string, which makes the %d formatter an error with upcoming go 1.11. Use %v for consistency. * apps: keep passing on block volume error conditions in rollback By failing to pass on the error conditions trying to remove a failed block volume creation we hide the error from the operations framework which causes it to remove the pending operation entry and we lose valuable debugging information and the ability to possibly automatically revert the operation later. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: test error hit during block volume create rollback Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: change block volumes to allocate size in Build step Previously, block volumes were only reducing the free space from the Block Hosting volumes in the Finalize step of the operation. This was incorrect behavior because it meant that Heketi could allow two concurrent block volume creations to essentially occupy the same free space. Then one of the operations would fail during the Exec phase, with a no-space error in gluster-block. This change moves the logic of reserving space to the Build phase as well as changing the Rollback behavior to expect the space to be taken on the BHV and return it in the clean up method. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a test to demonstrate that hosting volumes can be overused This test checks that when two concurrent block volume requests are received and these two block volumes would occupy the same space on the block hosting volume one of the requests is rejected. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add a test for concurrent block volume creates This test checks that when block volume requests are received concurrently, that all requests are satisfied even though the initial block hosting volume can not hold all the requests. This test will fail if the Build phase of the operation allows multiple block volumes for the same space on one block hosting volume. In that case the gluster-block command(s) fail with an out of space error. Signed-off-by: John Mulligan <jmulligan@redhat.com> * stop testing go 1.7.5 as part of travis ci Go version 1.7.5 is older than any version of go on platforms we normally ship on and it should not be needed to regularly test on this version any more. Signed-off-by: John Mulligan <jmulligan@redhat.com> * Revert "stop testing go 1.7.5 as part of travis ci" This reverts commit ecf2df9. * executors: prevent pvcreate/vgcreate/lvcreate from displaying an interactive prompt In certain circumstances LVM commands can prompt the user for a confirmation. Users will not be able to answer the prompt in case Heketi executes the commands. By passing the `-qq` parameter to the LVM commands, the default answer for the prompt will be no and the command exits with a failure. Fixes: gluster/gluster-kubernetes#497 Reported-bu: Ulf Lilleengen <lulf@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> * docker: load an initial topology if HEKETI_TOPOLOGY_FILE is set By providing the environment variable HEKETI_TOPOLOGY_FILE with a filename that is available in the container, the database gets populated with this topology file. In Kubernetes environments the HEKETI_TOPOLOGY_FILE can be provided as a ConfigMap or Secret and mounted as a volume in the pods specification. With this change it is now possible to deploy Heketi with a .yaml file that contains the topology.json and the Heketi Deployment. Signed-off-by: Niels de Vos <ndevos@redhat.com> * api: define a new type to restrict usage of block hosting volumes A new type and corresponding property on the volume's BlockInfo struct will be used to restrict if new block volumes can be placed on a block hosting volume. This can be used by admins for migrating off of one block hosting volume. It will also be used by the system to mark "sketchy" block hosting volumes post-upgrade. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: deny placing a block volume on a restricted BHV If the value is anything other than unrestricted the system will reject putting a new block volume on the BHV. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a test case to check block vols on restricted BHVs Add a test case to check that a new block volume will not be placed on a BHV that is restricted. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a new operation for changing the BHV's restriction state A restriction can be placed on a BHV immediately but a restriction can only be removed if certain conditions are met. Thus the operation allows the user to set the Locked restriction in the Build method of the operation, but clearing a restriction is only done in Finalize. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add tests for new BHV restriction set operation Signed-off-by: John Mulligan <jmulligan@redhat.com> * api: add a request type and validation for setting a restriction Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a new endpoint for setting a restriction on a BHV This endpoint allows the admin to set the "" (unrestricted) or "locked" values on a BHV, via the recently added operation. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add an upgrade function that tries to update reserved size on BHVs Add a new upgrade function that attempts to update the reserved size on all block hosting volumes in the db. If the expected amount of reserved size can't be taken the system will set the "locked-by-update" restriction on the BHV. This restriction can by cleared by the admin, but only if certain preconditions are met (see change that adds operation). If the values seen on BHVs are not sane the restriction will also be set. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add tests for upgrade func that updates reserved sizes Signed-off-by: John Mulligan <jmulligan@redhat.com> * python api: add python client apis for block volumes In order to write some updgrade tests (which is in python) add apis to access block volumes. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: test volumes & block volumes post upgrade from 6x Add a new db snapshot from 6x with both volumes and block volumes and test that they can be created and deleted post-upgrade. Signed-off-by: John Mulligan <jmulligan@redhat.com> * client: add client library for setting block restriction Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * client: cli changes for BHV lockdown Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * api: tweak output representation of no restriction Change the output for the unrestricted state (internally "", the empty string) to say "(none)" for no restrictions. By convention the parens indicate the value is virtual. Signed-off-by: John Mulligan <jmulligan@redhat.com> * api: add test for setting new block volume restriction Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: use a specific no-exist err for block volume delete Return a specific error type from the executor if the gluster-block command fails but reports that the volume we are attempting to delete does not exist. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: don't fail if a volume doesn't exist for a block vol delete If the executor layer reports that the volume we are deleting doesn't exist, do not fail the function. The end result is that the volume we were trying to delete is gone in both conditions (robust delete for block volumes). Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add a test for insufficient hosts on HA count = 3 Add a test case that checks that the system fails as expected if there are too few nodes working when an ha count of 3 is given. Also checks the recent change to teat a does-not-exist error from gluster block as a good exit condition for block volume delete. This tests that when the exec step fails due to ha count issues, that the overall operation is cleaned up. Signed-off-by: John Mulligan <jmulligan@redhat.com> * apps: add UpdateVolumeInfoComplete() Function for filtering out the pending block volumes from the volume info of block hosting volumes. Signed-off-by: Michael Adam <obnox@redhat.com> * apps: add a unit test for UpdateVolumeInfoComplete() Signed-off-by: Michael Adam <obnox@redhat.com> * apps: expose only completed block volumes through VolumeInfo Originally, also pending block volumes were listed in the volume info for a block hosting volume. This change filters those pending volumes out. Signed-off-by: Michael Adam <obnox@redhat.com> * apps:glusterfs: fix a comment typo for ListCompleteVolumes Signed-off-by: Michael Adam <obnox@redhat.com> * executors: fix method used to extract known errs on block vol delete Signed-off-by: John Mulligan <jmulligan@redhat.com> * executors: duplicate output on stderr for gluster-block delete Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add a test to check that deleting deleted BVs succeeds Adds a test case to verify that a previously deleted block volume at the gluster-block layer can subsequently be deleted from heketi. Signed-off-by: John Mulligan <jmulligan@redhat.com> * functional tests: add a test for error conditions on block delete Add a test case that tries deleting block volumes during a bad state in order to verify that heketi does not lose a block volume due to bad error handling. Signed-off-by: John Mulligan <jmulligan@redhat.com> * kubeexec: match the stderr output of kubeexec to that of sshexec Kubeexec was adding extra text to stderr before sending it back to caller. In cases where the stderr text has to be parsed it was leading to parse errors. It is sufficient to log the details of pod where the commands were executed and such. Let the error string be passed as-is to the caller. Signed-off-by: Raghavendra Talur <rtalur@redhat.com> * Fix error message in glusterfs.DbCreate() * Update the Dockerfile for release 8 Signed-off-by: John Mulligan <jmulligan@redhat.com> Co-authored-by: Michael Adam <obnox@redhat.com> Co-authored-by: John Mulligan <jmulligan@redhat.com> Co-authored-by: Niels de Vos <ndevos@redhat.com> Co-authored-by: zhengjiajin <zhengjiajin@caicloud.io> Co-authored-by: Raghavendra Talur <rtalur@redhat.com> Co-authored-by: Madhu Rajanna <mrajanna@redhat.com> Co-authored-by: Sven Anderson <sven@redhat.com> Co-authored-by: bloublou <bloublou2014@gmail.com> Co-authored-by: Yann Hodique <yhodique@google.com> Co-authored-by: Michael Adam <obnox@samba.org>
See #131