-
Notifications
You must be signed in to change notification settings - Fork 260
Store fixes; Windows compile #247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: John Howard <jhoward@microsoft.com> - First, the store timeout is woefully low. Bumped to 20 seconds from 2 seconds. This may fix Azure#242 (comment) IMO, as only test code calls it non-blocked, why even have a block parameter to Lock()? IMO also, why a timeout at all? They're always fraught with error and machine timing. - Presence of a key should be checked using `raw, ok := hvs.data[key]`, not the current nil checked - ErrKeyNotFound should be returned if the store file does not exist. It shouldn't ignore that error. - Actually now reports if a timeout occurred correctly, along with non-block lock attempt when already locked. - Serial pattern abuse in not always closing the lock file. - Some golang correctness (errors should be lower case) - go build ./... actually passes on Windows now - various compile errors previously. - golang pattern conformance `if err:=<test>; err!=nil {....` - Simplified timeout duration (no need for time.Duration(...))
|
@sharmasushant PTAL. @jterry75 FYI - a once over would be appreciated. From @PatrickLang via email: That change definitely seems to be scaling better. Can you open a PR? Before I would typically see a gap of a few minutes where no pods could start, then one more every few minutes. There were a lot of retries on the lock error, and I haven’t noticed any yet. kubectl get rs -w This is by no means a full test pass, but it looks better than before. |
|
@sharmasushant A question, not knowing the architecture from where the store is invoked. Is it called from multiple external processes simultaneously? Or from different threads of a single process? Whereby each process/thread (goroutine) is accessing the -same- store file. If multiple -processes- are accessing the same store, then this, while better, is still not safe. I strongly recommend someone move this across to bbolt regardless as that's both multi-thread and multi-process safe. |
|
@jhowardmsft it is accessed across different process (different invocations of cni binary). Sure, we can look into bbolt and evaluate it. Thanks a lot for suggestion. |
| plugin.Store, err = store.NewJsonFileStore(platform.CNIRuntimePath + plugin.Name + ".json") | ||
| if err != nil { | ||
| log.Printf("[cni] Failed to create store, err:%v.", err) | ||
| log.Printf("[cni] Failed to create store: %v.", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhowardmsft - I know its not you but typically no punctuation across other golang projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also. Typically the "[cni]" portion would be done via a logrus context or equivalent. (Again just FYI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I'm not changing the world here :)
| // Copyright 2017 Microsoft. All rights reserved. | ||
| // MIT License | ||
|
|
||
| // +build linux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work? I thought it had to be above header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly does in 1.11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...that lists the conditions under which a file should be included in the package. Constraints may appear in any kind of source file (not just Go), but they must appear near the top of the file, preceded only by blank lines and other line comments. These rules mean that in Go files a build constraint must appear before the package clause.
| // Copyright 2017 Microsoft. All rights reserved. | ||
| // MIT License | ||
|
|
||
| // +build linux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename? endpoint_linux.go specifically implies to me "// +build linux" by convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you are just doing this because its already included by default via the naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just being consistent with what's there.
|
|
||
| if !block || i == lockMaxRetries { | ||
| return ErrStoreLocked | ||
| if !block { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again not your code. But a better pattern would be to have:
func TryLock() bool -> Non-Blocking return value indicates wasLocked
func Lock() -> Blocking
func Unlock() -> Same for both
jterry75
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Minor comments and improvements unrelated to change.
|
ping @sharmasushant Can you review please? This seems better than it was. I'm working on a larger fix to entirely remove the store, replacing it with bolt. Almost have it ready for submission. @DavidSchott FYI |
|
ping @sharmasushant - this seems more stable. Can we merge so its not blocking other improvements? |
|
@tamilmani1989 Can you please look? |
|
@tamilmani1989 - can you help here? We have a lot of people working on Windows container stability over the next few months. The faster we can get changes in, the better. |
|
@PatrickLang Ya sure. I will do it in a day or two. |
tamilmani1989
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we test these changes in Linux and make sure that its not breaking anything?
|
|
||
| // Maximum number of retries before failing a lock call. | ||
| lockMaxRetries = 20 | ||
| lockMaxRetries = 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason changing this to 200?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout (2s) was woefully small. 20 seconds is still reasonably low. Arguably, there should be no need whatsoever for a timeout in a correctly operating system. In fact, in a subsequent change I have which removes the store entirely and moves this to boltdb, there is no timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| if err != nil { | ||
| if os.IsNotExist(err) { | ||
| return nil | ||
| return ErrKeyNotFound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not breaking anything right when CNI/CNM called first time on boot up when there is no state file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but it's a question of correctness, not obfuscation. The store package should behave and return errors as expected - a read of something which doesn't exist should return a not-found error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I got it. I'm just making sure we tested that scenario?
| return err | ||
| time.Sleep(lockRetryDelay) | ||
| } | ||
| defer lockFile.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how defer works in a loop . Just clarifying if open called multiple times and is close getting called for each open?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The defer of close is outside the loop.... It's only called at function exit, and only in the case that the open succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry! I missed it
@tamilmani1989 (See internal email too). The only thing I have been able to run is the unit tests, and ask Patrick to do some ad-hoc testing, but that's only on Windows. I would need to rely on you guys to verify this change fully. |
|
@jhowardmsft - We are still working on running tests automatically when a PR is opened against master. For now, we have to do it manually and make sure that doesn't break both in linux/windows. So we expect whoever opens a PR should atleast do sanity testing and make sure it doesn't break anything. Basic Tests:
I can help you/patrick running theses tests in Linux. |
I would need a LOT of assistance here. As previously mentioned, I know next to nothing about how this is supposed to work, how to setup or configure an environment. You would have to assume I am starting from scratch. I would also prefer a means to verify in a local VM/VMs rather than requiring Azure VMs to do this testing. |
|
How did you test it for windows then? Didn't you create any kubernetes cluster? |
Patrick did in kubernetes - see his previous comment and on email internally. I only did go test ./… , and do not either have a kubernetes setup, or would know what to do with it even if I did have one. |
|
@PatrickLang Have you done these tests for Linux. If not, Can you please do it for Linux? |
|
Unfortunately I can't do any more testing on it this week as I'm out at a conference. @jackfrancis or @khenidak can you help out here? I would also be open to merging this but only producing a Windows binary for now if there isn't enough testing on Linux. ACS-Engine can deploy different versions for Windows and Linux. |
|
@PatrickLang @jhowardmsft If you give me Linux binaries azure-vnet and azure-vnet-ipam, I will test it when i'm done with my current work. |
|
@tamilmani1989 Attaching azure-vnet and azure-vnet-ipam binaries for linux |
|
@sharmasushant I have tested it. LGTM. |
|
Can we merge this? I'm ready to submit part 2 - moving to a bolt database. |
Signed-off-by: John Howard <jhoward@microsoft.com> Move store to bbolt database This PR is a follow on to Azure#247 @tamilmani1989 @sharmasushant PTAL. @PatrickLang, @DavidSchott @dineshgovindasamy @madhanrm @jterry75 FYI. @msuiche perhaps you are able to perform more verification on this as well? As per Azure#247 (comment), while that PR was better, it was far from perfect. This PR replaces the store entirely and uses a bolt database to store the data. See Azure#247 (comment) Azure#247 (comment) Patrick gave me access to one of his Windows clusters to perform verification. While there were some errors, none appear attributed to this change. I was able to scale from 1 to 25, back to 1 and back up again. Hopefully this is finally the end of those lock store-related errors. It is not however the end of no-errors-at-all during scaling. I will leave that to others to investigate... I have NOT been able to test this against a linux node - perhaps @tamilmani1989 would be able to that as per before. In addition, this PR has a bunch of commits which fix (most) vendoring issues in this repo. There is still more to do there, but again, I will leave that for others to resolve. I had to tackle vendoring to some extent to pull in bbolt. Finally, there are two other commits in this PR. - I have put in an implementation of GetLastRebootTime on Windows. As it's implementation changes the startup functionality, I have left that effectively stubbed out for someone else to follow through with. - I hit a SIGSEGV in testing in UpdateSendAndReport. Made that safe. Here's the 25 pods scaling up-and-down on Patricks cluster: ``` NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-28vrj 1/1 Running 0 10m 10.240.0.10 13833k8s9000 <none> iis-1803-687cdddf9f-5fjcn 1/1 Running 0 6m33s 10.240.0.30 13833k8s9000 <none> iis-1803-687cdddf9f-6dk28 1/1 Running 0 10m 10.240.0.31 13833k8s9000 <none> iis-1803-687cdddf9f-6j8wg 1/1 Running 0 6m33s 10.240.0.11 13833k8s9000 <none> iis-1803-687cdddf9f-8f5kc 1/1 Running 1 6m33s 10.240.0.14 13833k8s9000 <none> iis-1803-687cdddf9f-bkd7n 1/1 Running 0 10m 10.240.0.28 13833k8s9000 <none> iis-1803-687cdddf9f-bth4v 1/1 Running 0 6m33s 10.240.0.23 13833k8s9000 <none> iis-1803-687cdddf9f-csm2x 1/1 Running 0 10m 10.240.0.5 13833k8s9000 <none> iis-1803-687cdddf9f-dtvqp 1/1 Running 1 6m33s 10.240.0.9 13833k8s9000 <none> iis-1803-687cdddf9f-fv9rn 1/1 Running 1 6m33s 10.240.0.20 13833k8s9000 <none> iis-1803-687cdddf9f-gmzcz 1/1 Running 1 6m33s 10.240.0.12 13833k8s9000 <none> iis-1803-687cdddf9f-kzmcf 1/1 Running 0 10m 10.240.0.7 13833k8s9000 <none> iis-1803-687cdddf9f-lltjr 1/1 Running 1 6m33s 10.240.0.13 13833k8s9000 <none> iis-1803-687cdddf9f-lx2vf 1/1 Running 0 10m 10.240.0.26 13833k8s9000 <none> iis-1803-687cdddf9f-nn9pp 1/1 Running 1 6m33s 10.240.0.21 13833k8s9000 <none> iis-1803-687cdddf9f-pjcws 1/1 Running 1 6m33s 10.240.0.22 13833k8s9000 <none> iis-1803-687cdddf9f-q7hsf 1/1 Running 1 6m33s 10.240.0.33 13833k8s9000 <none> iis-1803-687cdddf9f-qn5c7 1/1 Running 0 10m 10.240.0.27 13833k8s9000 <none> iis-1803-687cdddf9f-rt6r5 1/1 Running 1 6m33s 10.240.0.17 13833k8s9000 <none> iis-1803-687cdddf9f-s2jsb 1/1 Running 0 10m 10.240.0.8 13833k8s9000 <none> iis-1803-687cdddf9f-sgwb8 1/1 Running 1 6m33s 10.240.0.25 13833k8s9000 <none> iis-1803-687cdddf9f-x9tpt 1/1 Running 0 10m 10.240.0.29 13833k8s9000 <none> iis-1803-687cdddf9f-xf6x9 1/1 Running 0 6m33s 10.240.0.24 13833k8s9000 <none> iis-1803-687cdddf9f-xwfxg 1/1 Running 0 10m 10.240.0.15 13833k8s9000 <none> iis-1803-687cdddf9f-zf8kv 1/1 Running 1 6m33s 10.240.0.16 13833k8s9000 <none> azureuser@k8s-master-13833463-0:~/john$ curl http://10.240.0.16 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <title>IIS Windows Server</title> <style type="text/css"> <!-- body { color:#000000; background-color:#0072C6; margin:0; } margin-left:auto; margin-right:auto; text-align:center; } a img { border:none; } --> </style> </head> <body> <div id="container"> <a href="http://go.microsoft.com/fwlink/?linkid=66138&clcid=0x409"><img src="iisstart.png" alt="IIS" width="960" height="600" /></a> </div> </body> </html> ``` Then scaling back down: ``` azureuser@k8s-master-13833463-0:~/john$ kubectl scale deploy iis-1803 --replicas=1 deployment.extensions/iis-1803 scaled azureuser@k8s-master-13833463-0:~/john$ ``` Some time later... ``` azureuser@k8s-master-13833463-0:~/john$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-x9tpt 1/1 Running 0 16m 10.240.0.29 13833k8s9000 <none> ``` And scaling back up again ``` azureuser@k8s-master-13833463-0:~/john$ kubectl scale deploy iis-1803 --replicas=25 deployment.extensions/iis-1803 scaled azureuser@k8s-master-13833463-0:~/john$ ``` Some time later... ``` zureuser@k8s-master-13833463-0:~/john$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-26p6l 1/1 Running 0 7m 10.240.0.11 13833k8s9000 <none> iis-1803-687cdddf9f-2ktdz 1/1 Running 0 7m 10.240.0.10 13833k8s9000 <none> iis-1803-687cdddf9f-48ggp 1/1 Running 0 7m 10.240.0.5 13833k8s9000 <none> iis-1803-687cdddf9f-4gtmb 1/1 Running 0 7m 10.240.0.26 13833k8s9000 <none> iis-1803-687cdddf9f-4hd72 1/1 Running 0 7m 10.240.0.24 13833k8s9000 <none> iis-1803-687cdddf9f-4pwsq 1/1 Running 1 7m 10.240.0.33 13833k8s9000 <none> iis-1803-687cdddf9f-5kw22 1/1 Running 0 7m 10.240.0.9 13833k8s9000 <none> iis-1803-687cdddf9f-664z7 1/1 Running 1 7m 10.240.0.16 13833k8s9000 <none> iis-1803-687cdddf9f-8swz7 1/1 Running 1 7m1s 10.240.0.7 13833k8s9000 <none> iis-1803-687cdddf9f-9h98r 1/1 Running 1 7m 10.240.0.8 13833k8s9000 <none> iis-1803-687cdddf9f-9h9jd 1/1 Running 1 7m 10.240.0.14 13833k8s9000 <none> iis-1803-687cdddf9f-lftd7 1/1 Running 1 7m 10.240.0.19 13833k8s9000 <none> iis-1803-687cdddf9f-m9knq 1/1 Running 1 7m 10.240.0.31 13833k8s9000 <none> iis-1803-687cdddf9f-mplcc 1/1 Running 1 7m 10.240.0.21 13833k8s9000 <none> iis-1803-687cdddf9f-p7jn2 1/1 Running 0 7m 10.240.0.20 13833k8s9000 <none> iis-1803-687cdddf9f-sml2x 1/1 Running 0 7m1s 10.240.0.13 13833k8s9000 <none> iis-1803-687cdddf9f-tjfws 1/1 Running 0 7m 10.240.0.18 13833k8s9000 <none> iis-1803-687cdddf9f-vxdl4 1/1 Running 0 7m 10.240.0.15 13833k8s9000 <none> iis-1803-687cdddf9f-x26vj 1/1 Running 1 7m1s 10.240.0.30 13833k8s9000 <none> iis-1803-687cdddf9f-x2hll 1/1 Running 1 7m 10.240.0.28 13833k8s9000 <none> iis-1803-687cdddf9f-x9tpt 1/1 Running 0 24m 10.240.0.29 13833k8s9000 <none> iis-1803-687cdddf9f-xg5bm 1/1 Running 1 7m 10.240.0.23 13833k8s9000 <none> iis-1803-687cdddf9f-zkkzm 1/1 Running 1 7m 10.240.0.32 13833k8s9000 <none> iis-1803-687cdddf9f-zqv69 1/1 Running 0 7m 10.240.0.17 13833k8s9000 <none> iis-1803-687cdddf9f-zvzn9 1/1 Running 0 7m 10.240.0.27 13833k8s9000 <none> azureuser@k8s-master-13833463-0:~/john$ ```
Signed-off-by: John Howard <jhoward@microsoft.com> Move store to bbolt database This PR is a follow on to Azure#247 @tamilmani1989 @sharmasushant PTAL. @PatrickLang, @DavidSchott @dineshgovindasamy @madhanrm @jterry75 FYI. @msuiche perhaps you are able to perform more verification on this as well? As per Azure#247 (comment), while that PR was better, it was far from perfect. This PR replaces the store entirely and uses a bolt database to store the data. See Azure#247 (comment) Azure#247 (comment) Patrick gave me access to one of his Windows clusters to perform verification. While there were some errors, none appear attributed to this change. I was able to scale from 1 to 25, back to 1 and back up again. Hopefully this is finally the end of those lock store-related errors. It is not however the end of no-errors-at-all during scaling. I will leave that to others to investigate... I have NOT been able to test this against a linux node - perhaps @tamilmani1989 would be able to that as per before. In addition, this PR has a bunch of commits which fix (most) vendoring issues in this repo. There is still more to do there, but again, I will leave that for others to resolve. I had to tackle vendoring to some extent to pull in bbolt. Finally, there are two other commits in this PR. - I have put in an implementation of GetLastRebootTime on Windows. As it's implementation changes the startup functionality, I have left that effectively stubbed out for someone else to follow through with. - I hit a SIGSEGV in testing in UpdateSendAndReport. Made that safe. Here's the 25 pods scaling up-and-down on Patricks cluster: ``` NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-28vrj 1/1 Running 0 10m 10.240.0.10 13833k8s9000 <none> iis-1803-687cdddf9f-5fjcn 1/1 Running 0 6m33s 10.240.0.30 13833k8s9000 <none> iis-1803-687cdddf9f-6dk28 1/1 Running 0 10m 10.240.0.31 13833k8s9000 <none> iis-1803-687cdddf9f-6j8wg 1/1 Running 0 6m33s 10.240.0.11 13833k8s9000 <none> iis-1803-687cdddf9f-8f5kc 1/1 Running 1 6m33s 10.240.0.14 13833k8s9000 <none> iis-1803-687cdddf9f-bkd7n 1/1 Running 0 10m 10.240.0.28 13833k8s9000 <none> iis-1803-687cdddf9f-bth4v 1/1 Running 0 6m33s 10.240.0.23 13833k8s9000 <none> iis-1803-687cdddf9f-csm2x 1/1 Running 0 10m 10.240.0.5 13833k8s9000 <none> iis-1803-687cdddf9f-dtvqp 1/1 Running 1 6m33s 10.240.0.9 13833k8s9000 <none> iis-1803-687cdddf9f-fv9rn 1/1 Running 1 6m33s 10.240.0.20 13833k8s9000 <none> iis-1803-687cdddf9f-gmzcz 1/1 Running 1 6m33s 10.240.0.12 13833k8s9000 <none> iis-1803-687cdddf9f-kzmcf 1/1 Running 0 10m 10.240.0.7 13833k8s9000 <none> iis-1803-687cdddf9f-lltjr 1/1 Running 1 6m33s 10.240.0.13 13833k8s9000 <none> iis-1803-687cdddf9f-lx2vf 1/1 Running 0 10m 10.240.0.26 13833k8s9000 <none> iis-1803-687cdddf9f-nn9pp 1/1 Running 1 6m33s 10.240.0.21 13833k8s9000 <none> iis-1803-687cdddf9f-pjcws 1/1 Running 1 6m33s 10.240.0.22 13833k8s9000 <none> iis-1803-687cdddf9f-q7hsf 1/1 Running 1 6m33s 10.240.0.33 13833k8s9000 <none> iis-1803-687cdddf9f-qn5c7 1/1 Running 0 10m 10.240.0.27 13833k8s9000 <none> iis-1803-687cdddf9f-rt6r5 1/1 Running 1 6m33s 10.240.0.17 13833k8s9000 <none> iis-1803-687cdddf9f-s2jsb 1/1 Running 0 10m 10.240.0.8 13833k8s9000 <none> iis-1803-687cdddf9f-sgwb8 1/1 Running 1 6m33s 10.240.0.25 13833k8s9000 <none> iis-1803-687cdddf9f-x9tpt 1/1 Running 0 10m 10.240.0.29 13833k8s9000 <none> iis-1803-687cdddf9f-xf6x9 1/1 Running 0 6m33s 10.240.0.24 13833k8s9000 <none> iis-1803-687cdddf9f-xwfxg 1/1 Running 0 10m 10.240.0.15 13833k8s9000 <none> iis-1803-687cdddf9f-zf8kv 1/1 Running 1 6m33s 10.240.0.16 13833k8s9000 <none> azureuser@k8s-master-13833463-0:~/john$ curl http://10.240.0.16 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <title>IIS Windows Server</title> <style type="text/css"> <!-- body { color:#000000; background-color:#0072C6; margin:0; } margin-left:auto; margin-right:auto; text-align:center; } a img { border:none; } --> </style> </head> <body> <div id="container"> <a href="http://go.microsoft.com/fwlink/?linkid=66138&clcid=0x409"><img src="iisstart.png" alt="IIS" width="960" height="600" /></a> </div> </body> </html> ``` Then scaling back down: ``` azureuser@k8s-master-13833463-0:~/john$ kubectl scale deploy iis-1803 --replicas=1 deployment.extensions/iis-1803 scaled azureuser@k8s-master-13833463-0:~/john$ ``` Some time later... ``` azureuser@k8s-master-13833463-0:~/john$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-x9tpt 1/1 Running 0 16m 10.240.0.29 13833k8s9000 <none> ``` And scaling back up again ``` azureuser@k8s-master-13833463-0:~/john$ kubectl scale deploy iis-1803 --replicas=25 deployment.extensions/iis-1803 scaled azureuser@k8s-master-13833463-0:~/john$ ``` Some time later... ``` zureuser@k8s-master-13833463-0:~/john$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-26p6l 1/1 Running 0 7m 10.240.0.11 13833k8s9000 <none> iis-1803-687cdddf9f-2ktdz 1/1 Running 0 7m 10.240.0.10 13833k8s9000 <none> iis-1803-687cdddf9f-48ggp 1/1 Running 0 7m 10.240.0.5 13833k8s9000 <none> iis-1803-687cdddf9f-4gtmb 1/1 Running 0 7m 10.240.0.26 13833k8s9000 <none> iis-1803-687cdddf9f-4hd72 1/1 Running 0 7m 10.240.0.24 13833k8s9000 <none> iis-1803-687cdddf9f-4pwsq 1/1 Running 1 7m 10.240.0.33 13833k8s9000 <none> iis-1803-687cdddf9f-5kw22 1/1 Running 0 7m 10.240.0.9 13833k8s9000 <none> iis-1803-687cdddf9f-664z7 1/1 Running 1 7m 10.240.0.16 13833k8s9000 <none> iis-1803-687cdddf9f-8swz7 1/1 Running 1 7m1s 10.240.0.7 13833k8s9000 <none> iis-1803-687cdddf9f-9h98r 1/1 Running 1 7m 10.240.0.8 13833k8s9000 <none> iis-1803-687cdddf9f-9h9jd 1/1 Running 1 7m 10.240.0.14 13833k8s9000 <none> iis-1803-687cdddf9f-lftd7 1/1 Running 1 7m 10.240.0.19 13833k8s9000 <none> iis-1803-687cdddf9f-m9knq 1/1 Running 1 7m 10.240.0.31 13833k8s9000 <none> iis-1803-687cdddf9f-mplcc 1/1 Running 1 7m 10.240.0.21 13833k8s9000 <none> iis-1803-687cdddf9f-p7jn2 1/1 Running 0 7m 10.240.0.20 13833k8s9000 <none> iis-1803-687cdddf9f-sml2x 1/1 Running 0 7m1s 10.240.0.13 13833k8s9000 <none> iis-1803-687cdddf9f-tjfws 1/1 Running 0 7m 10.240.0.18 13833k8s9000 <none> iis-1803-687cdddf9f-vxdl4 1/1 Running 0 7m 10.240.0.15 13833k8s9000 <none> iis-1803-687cdddf9f-x26vj 1/1 Running 1 7m1s 10.240.0.30 13833k8s9000 <none> iis-1803-687cdddf9f-x2hll 1/1 Running 1 7m 10.240.0.28 13833k8s9000 <none> iis-1803-687cdddf9f-x9tpt 1/1 Running 0 24m 10.240.0.29 13833k8s9000 <none> iis-1803-687cdddf9f-xg5bm 1/1 Running 1 7m 10.240.0.23 13833k8s9000 <none> iis-1803-687cdddf9f-zkkzm 1/1 Running 1 7m 10.240.0.32 13833k8s9000 <none> iis-1803-687cdddf9f-zqv69 1/1 Running 0 7m 10.240.0.17 13833k8s9000 <none> iis-1803-687cdddf9f-zvzn9 1/1 Running 0 7m 10.240.0.27 13833k8s9000 <none> azureuser@k8s-master-13833463-0:~/john$ ```
Signed-off-by: John Howard <jhoward@microsoft.com> Move store to bbolt database This PR is a follow on to Azure#247 @tamilmani1989 @sharmasushant PTAL. @PatrickLang, @DavidSchott @dineshgovindasamy @madhanrm @jterry75 FYI. @msuiche perhaps you are able to perform more verification on this as well? As per Azure#247 (comment), while that PR was better, it was far from perfect. This PR replaces the store entirely and uses a bolt database to store the data. See Azure#247 (comment) Azure#247 (comment) Patrick gave me access to one of his Windows clusters to perform verification. While there were some errors, none appear attributed to this change. I was able to scale from 1 to 25, back to 1 and back up again. Hopefully this is finally the end of those lock store-related errors. It is not however the end of no-errors-at-all during scaling. I will leave that to others to investigate... I have NOT been able to test this against a linux node - perhaps @tamilmani1989 would be able to that as per before. In addition, this PR has a bunch of commits which fix (most) vendoring issues in this repo. There is still more to do there, but again, I will leave that for others to resolve. I had to tackle vendoring to some extent to pull in bbolt. Finally, there are two other commits in this PR. - I have put in an implementation of GetLastRebootTime on Windows. As it's implementation changes the startup functionality, I have left that effectively stubbed out for someone else to follow through with. - I hit a SIGSEGV in testing in UpdateSendAndReport. Made that safe. Here's the 25 pods scaling up-and-down on Patricks cluster: ``` NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-28vrj 1/1 Running 0 10m 10.240.0.10 13833k8s9000 <none> iis-1803-687cdddf9f-5fjcn 1/1 Running 0 6m33s 10.240.0.30 13833k8s9000 <none> iis-1803-687cdddf9f-6dk28 1/1 Running 0 10m 10.240.0.31 13833k8s9000 <none> iis-1803-687cdddf9f-6j8wg 1/1 Running 0 6m33s 10.240.0.11 13833k8s9000 <none> iis-1803-687cdddf9f-8f5kc 1/1 Running 1 6m33s 10.240.0.14 13833k8s9000 <none> iis-1803-687cdddf9f-bkd7n 1/1 Running 0 10m 10.240.0.28 13833k8s9000 <none> iis-1803-687cdddf9f-bth4v 1/1 Running 0 6m33s 10.240.0.23 13833k8s9000 <none> iis-1803-687cdddf9f-csm2x 1/1 Running 0 10m 10.240.0.5 13833k8s9000 <none> iis-1803-687cdddf9f-dtvqp 1/1 Running 1 6m33s 10.240.0.9 13833k8s9000 <none> iis-1803-687cdddf9f-fv9rn 1/1 Running 1 6m33s 10.240.0.20 13833k8s9000 <none> iis-1803-687cdddf9f-gmzcz 1/1 Running 1 6m33s 10.240.0.12 13833k8s9000 <none> iis-1803-687cdddf9f-kzmcf 1/1 Running 0 10m 10.240.0.7 13833k8s9000 <none> iis-1803-687cdddf9f-lltjr 1/1 Running 1 6m33s 10.240.0.13 13833k8s9000 <none> iis-1803-687cdddf9f-lx2vf 1/1 Running 0 10m 10.240.0.26 13833k8s9000 <none> iis-1803-687cdddf9f-nn9pp 1/1 Running 1 6m33s 10.240.0.21 13833k8s9000 <none> iis-1803-687cdddf9f-pjcws 1/1 Running 1 6m33s 10.240.0.22 13833k8s9000 <none> iis-1803-687cdddf9f-q7hsf 1/1 Running 1 6m33s 10.240.0.33 13833k8s9000 <none> iis-1803-687cdddf9f-qn5c7 1/1 Running 0 10m 10.240.0.27 13833k8s9000 <none> iis-1803-687cdddf9f-rt6r5 1/1 Running 1 6m33s 10.240.0.17 13833k8s9000 <none> iis-1803-687cdddf9f-s2jsb 1/1 Running 0 10m 10.240.0.8 13833k8s9000 <none> iis-1803-687cdddf9f-sgwb8 1/1 Running 1 6m33s 10.240.0.25 13833k8s9000 <none> iis-1803-687cdddf9f-x9tpt 1/1 Running 0 10m 10.240.0.29 13833k8s9000 <none> iis-1803-687cdddf9f-xf6x9 1/1 Running 0 6m33s 10.240.0.24 13833k8s9000 <none> iis-1803-687cdddf9f-xwfxg 1/1 Running 0 10m 10.240.0.15 13833k8s9000 <none> iis-1803-687cdddf9f-zf8kv 1/1 Running 1 6m33s 10.240.0.16 13833k8s9000 <none> azureuser@k8s-master-13833463-0:~/john$ curl http://10.240.0.16 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <title>IIS Windows Server</title> <style type="text/css"> <!-- body { color:#000000; background-color:#0072C6; margin:0; } margin-left:auto; margin-right:auto; text-align:center; } a img { border:none; } --> </style> </head> <body> <div id="container"> <a href="http://go.microsoft.com/fwlink/?linkid=66138&clcid=0x409"><img src="iisstart.png" alt="IIS" width="960" height="600" /></a> </div> </body> </html> ``` Then scaling back down: ``` azureuser@k8s-master-13833463-0:~/john$ kubectl scale deploy iis-1803 --replicas=1 deployment.extensions/iis-1803 scaled azureuser@k8s-master-13833463-0:~/john$ ``` Some time later... ``` azureuser@k8s-master-13833463-0:~/john$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-x9tpt 1/1 Running 0 16m 10.240.0.29 13833k8s9000 <none> ``` And scaling back up again ``` azureuser@k8s-master-13833463-0:~/john$ kubectl scale deploy iis-1803 --replicas=25 deployment.extensions/iis-1803 scaled azureuser@k8s-master-13833463-0:~/john$ ``` Some time later... ``` zureuser@k8s-master-13833463-0:~/john$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-26p6l 1/1 Running 0 7m 10.240.0.11 13833k8s9000 <none> iis-1803-687cdddf9f-2ktdz 1/1 Running 0 7m 10.240.0.10 13833k8s9000 <none> iis-1803-687cdddf9f-48ggp 1/1 Running 0 7m 10.240.0.5 13833k8s9000 <none> iis-1803-687cdddf9f-4gtmb 1/1 Running 0 7m 10.240.0.26 13833k8s9000 <none> iis-1803-687cdddf9f-4hd72 1/1 Running 0 7m 10.240.0.24 13833k8s9000 <none> iis-1803-687cdddf9f-4pwsq 1/1 Running 1 7m 10.240.0.33 13833k8s9000 <none> iis-1803-687cdddf9f-5kw22 1/1 Running 0 7m 10.240.0.9 13833k8s9000 <none> iis-1803-687cdddf9f-664z7 1/1 Running 1 7m 10.240.0.16 13833k8s9000 <none> iis-1803-687cdddf9f-8swz7 1/1 Running 1 7m1s 10.240.0.7 13833k8s9000 <none> iis-1803-687cdddf9f-9h98r 1/1 Running 1 7m 10.240.0.8 13833k8s9000 <none> iis-1803-687cdddf9f-9h9jd 1/1 Running 1 7m 10.240.0.14 13833k8s9000 <none> iis-1803-687cdddf9f-lftd7 1/1 Running 1 7m 10.240.0.19 13833k8s9000 <none> iis-1803-687cdddf9f-m9knq 1/1 Running 1 7m 10.240.0.31 13833k8s9000 <none> iis-1803-687cdddf9f-mplcc 1/1 Running 1 7m 10.240.0.21 13833k8s9000 <none> iis-1803-687cdddf9f-p7jn2 1/1 Running 0 7m 10.240.0.20 13833k8s9000 <none> iis-1803-687cdddf9f-sml2x 1/1 Running 0 7m1s 10.240.0.13 13833k8s9000 <none> iis-1803-687cdddf9f-tjfws 1/1 Running 0 7m 10.240.0.18 13833k8s9000 <none> iis-1803-687cdddf9f-vxdl4 1/1 Running 0 7m 10.240.0.15 13833k8s9000 <none> iis-1803-687cdddf9f-x26vj 1/1 Running 1 7m1s 10.240.0.30 13833k8s9000 <none> iis-1803-687cdddf9f-x2hll 1/1 Running 1 7m 10.240.0.28 13833k8s9000 <none> iis-1803-687cdddf9f-x9tpt 1/1 Running 0 24m 10.240.0.29 13833k8s9000 <none> iis-1803-687cdddf9f-xg5bm 1/1 Running 1 7m 10.240.0.23 13833k8s9000 <none> iis-1803-687cdddf9f-zkkzm 1/1 Running 1 7m 10.240.0.32 13833k8s9000 <none> iis-1803-687cdddf9f-zqv69 1/1 Running 0 7m 10.240.0.17 13833k8s9000 <none> iis-1803-687cdddf9f-zvzn9 1/1 Running 0 7m 10.240.0.27 13833k8s9000 <none> azureuser@k8s-master-13833463-0:~/john$ ```
Signed-off-by: John Howard <jhoward@microsoft.com> Move store to bbolt database This PR is a follow on to Azure#247 @tamilmani1989 @sharmasushant PTAL. @PatrickLang, @DavidSchott @dineshgovindasamy @madhanrm @jterry75 FYI. @msuiche perhaps you are able to perform more verification on this as well? As per Azure#247 (comment), while that PR was better, it was far from perfect. This PR replaces the store entirely and uses a bolt database to store the data. See Azure#247 (comment) Azure#247 (comment) Patrick gave me access to one of his Windows clusters to perform verification. While there were some errors, none appear attributed to this change. I was able to scale from 1 to 25, back to 1 and back up again. Hopefully this is finally the end of those lock store-related errors. It is not however the end of no-errors-at-all during scaling. I will leave that to others to investigate... I have NOT been able to test this against a linux node - perhaps @tamilmani1989 would be able to that as per before. In addition, this PR has a bunch of commits which fix (most) vendoring issues in this repo. There is still more to do there, but again, I will leave that for others to resolve. I had to tackle vendoring to some extent to pull in bbolt. Finally, there are two other commits in this PR. - I have put in an implementation of GetLastRebootTime on Windows. As it's implementation changes the startup functionality, I have left that effectively stubbed out for someone else to follow through with. - I hit a SIGSEGV in testing in UpdateSendAndReport. Made that safe. Here's the 25 pods scaling up-and-down on Patricks cluster: ``` NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-28vrj 1/1 Running 0 10m 10.240.0.10 13833k8s9000 <none> iis-1803-687cdddf9f-5fjcn 1/1 Running 0 6m33s 10.240.0.30 13833k8s9000 <none> iis-1803-687cdddf9f-6dk28 1/1 Running 0 10m 10.240.0.31 13833k8s9000 <none> iis-1803-687cdddf9f-6j8wg 1/1 Running 0 6m33s 10.240.0.11 13833k8s9000 <none> iis-1803-687cdddf9f-8f5kc 1/1 Running 1 6m33s 10.240.0.14 13833k8s9000 <none> iis-1803-687cdddf9f-bkd7n 1/1 Running 0 10m 10.240.0.28 13833k8s9000 <none> iis-1803-687cdddf9f-bth4v 1/1 Running 0 6m33s 10.240.0.23 13833k8s9000 <none> iis-1803-687cdddf9f-csm2x 1/1 Running 0 10m 10.240.0.5 13833k8s9000 <none> iis-1803-687cdddf9f-dtvqp 1/1 Running 1 6m33s 10.240.0.9 13833k8s9000 <none> iis-1803-687cdddf9f-fv9rn 1/1 Running 1 6m33s 10.240.0.20 13833k8s9000 <none> iis-1803-687cdddf9f-gmzcz 1/1 Running 1 6m33s 10.240.0.12 13833k8s9000 <none> iis-1803-687cdddf9f-kzmcf 1/1 Running 0 10m 10.240.0.7 13833k8s9000 <none> iis-1803-687cdddf9f-lltjr 1/1 Running 1 6m33s 10.240.0.13 13833k8s9000 <none> iis-1803-687cdddf9f-lx2vf 1/1 Running 0 10m 10.240.0.26 13833k8s9000 <none> iis-1803-687cdddf9f-nn9pp 1/1 Running 1 6m33s 10.240.0.21 13833k8s9000 <none> iis-1803-687cdddf9f-pjcws 1/1 Running 1 6m33s 10.240.0.22 13833k8s9000 <none> iis-1803-687cdddf9f-q7hsf 1/1 Running 1 6m33s 10.240.0.33 13833k8s9000 <none> iis-1803-687cdddf9f-qn5c7 1/1 Running 0 10m 10.240.0.27 13833k8s9000 <none> iis-1803-687cdddf9f-rt6r5 1/1 Running 1 6m33s 10.240.0.17 13833k8s9000 <none> iis-1803-687cdddf9f-s2jsb 1/1 Running 0 10m 10.240.0.8 13833k8s9000 <none> iis-1803-687cdddf9f-sgwb8 1/1 Running 1 6m33s 10.240.0.25 13833k8s9000 <none> iis-1803-687cdddf9f-x9tpt 1/1 Running 0 10m 10.240.0.29 13833k8s9000 <none> iis-1803-687cdddf9f-xf6x9 1/1 Running 0 6m33s 10.240.0.24 13833k8s9000 <none> iis-1803-687cdddf9f-xwfxg 1/1 Running 0 10m 10.240.0.15 13833k8s9000 <none> iis-1803-687cdddf9f-zf8kv 1/1 Running 1 6m33s 10.240.0.16 13833k8s9000 <none> azureuser@k8s-master-13833463-0:~/john$ curl http://10.240.0.16 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <title>IIS Windows Server</title> <style type="text/css"> <!-- body { color:#000000; background-color:#0072C6; margin:0; } margin-left:auto; margin-right:auto; text-align:center; } a img { border:none; } --> </style> </head> <body> <div id="container"> <a href="http://go.microsoft.com/fwlink/?linkid=66138&clcid=0x409"><img src="iisstart.png" alt="IIS" width="960" height="600" /></a> </div> </body> </html> ``` Then scaling back down: ``` azureuser@k8s-master-13833463-0:~/john$ kubectl scale deploy iis-1803 --replicas=1 deployment.extensions/iis-1803 scaled azureuser@k8s-master-13833463-0:~/john$ ``` Some time later... ``` azureuser@k8s-master-13833463-0:~/john$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-x9tpt 1/1 Running 0 16m 10.240.0.29 13833k8s9000 <none> ``` And scaling back up again ``` azureuser@k8s-master-13833463-0:~/john$ kubectl scale deploy iis-1803 --replicas=25 deployment.extensions/iis-1803 scaled azureuser@k8s-master-13833463-0:~/john$ ``` Some time later... ``` zureuser@k8s-master-13833463-0:~/john$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-26p6l 1/1 Running 0 7m 10.240.0.11 13833k8s9000 <none> iis-1803-687cdddf9f-2ktdz 1/1 Running 0 7m 10.240.0.10 13833k8s9000 <none> iis-1803-687cdddf9f-48ggp 1/1 Running 0 7m 10.240.0.5 13833k8s9000 <none> iis-1803-687cdddf9f-4gtmb 1/1 Running 0 7m 10.240.0.26 13833k8s9000 <none> iis-1803-687cdddf9f-4hd72 1/1 Running 0 7m 10.240.0.24 13833k8s9000 <none> iis-1803-687cdddf9f-4pwsq 1/1 Running 1 7m 10.240.0.33 13833k8s9000 <none> iis-1803-687cdddf9f-5kw22 1/1 Running 0 7m 10.240.0.9 13833k8s9000 <none> iis-1803-687cdddf9f-664z7 1/1 Running 1 7m 10.240.0.16 13833k8s9000 <none> iis-1803-687cdddf9f-8swz7 1/1 Running 1 7m1s 10.240.0.7 13833k8s9000 <none> iis-1803-687cdddf9f-9h98r 1/1 Running 1 7m 10.240.0.8 13833k8s9000 <none> iis-1803-687cdddf9f-9h9jd 1/1 Running 1 7m 10.240.0.14 13833k8s9000 <none> iis-1803-687cdddf9f-lftd7 1/1 Running 1 7m 10.240.0.19 13833k8s9000 <none> iis-1803-687cdddf9f-m9knq 1/1 Running 1 7m 10.240.0.31 13833k8s9000 <none> iis-1803-687cdddf9f-mplcc 1/1 Running 1 7m 10.240.0.21 13833k8s9000 <none> iis-1803-687cdddf9f-p7jn2 1/1 Running 0 7m 10.240.0.20 13833k8s9000 <none> iis-1803-687cdddf9f-sml2x 1/1 Running 0 7m1s 10.240.0.13 13833k8s9000 <none> iis-1803-687cdddf9f-tjfws 1/1 Running 0 7m 10.240.0.18 13833k8s9000 <none> iis-1803-687cdddf9f-vxdl4 1/1 Running 0 7m 10.240.0.15 13833k8s9000 <none> iis-1803-687cdddf9f-x26vj 1/1 Running 1 7m1s 10.240.0.30 13833k8s9000 <none> iis-1803-687cdddf9f-x2hll 1/1 Running 1 7m 10.240.0.28 13833k8s9000 <none> iis-1803-687cdddf9f-x9tpt 1/1 Running 0 24m 10.240.0.29 13833k8s9000 <none> iis-1803-687cdddf9f-xg5bm 1/1 Running 1 7m 10.240.0.23 13833k8s9000 <none> iis-1803-687cdddf9f-zkkzm 1/1 Running 1 7m 10.240.0.32 13833k8s9000 <none> iis-1803-687cdddf9f-zqv69 1/1 Running 0 7m 10.240.0.17 13833k8s9000 <none> iis-1803-687cdddf9f-zvzn9 1/1 Running 0 7m 10.240.0.27 13833k8s9000 <none> azureuser@k8s-master-13833463-0:~/john$ ```
Signed-off-by: John Howard <jhoward@microsoft.com> Move store to bbolt database This PR is a follow on to Azure#247 @tamilmani1989 @sharmasushant PTAL. @PatrickLang, @DavidSchott @dineshgovindasamy @madhanrm @jterry75 FYI. @msuiche perhaps you are able to perform more verification on this as well? As per Azure#247 (comment), while that PR was better, it was far from perfect. This PR replaces the store entirely and uses a bolt database to store the data. See Azure#247 (comment) Azure#247 (comment) Patrick gave me access to one of his Windows clusters to perform verification. While there were some errors, none appear attributed to this change. I was able to scale from 1 to 25, back to 1 and back up again. Hopefully this is finally the end of those lock store-related errors. It is not however the end of no-errors-at-all during scaling. I will leave that to others to investigate... I have NOT been able to test this against a linux node - perhaps @tamilmani1989 would be able to that as per before. In addition, this PR has a bunch of commits which fix (most) vendoring issues in this repo. There is still more to do there, but again, I will leave that for others to resolve. I had to tackle vendoring to some extent to pull in bbolt. Finally, there are two other commits in this PR. - I have put in an implementation of GetLastRebootTime on Windows. As it's implementation changes the startup functionality, I have left that effectively stubbed out for someone else to follow through with. - I hit a SIGSEGV in testing in UpdateSendAndReport. Made that safe. Here's the 25 pods scaling up-and-down on Patricks cluster: ``` NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-28vrj 1/1 Running 0 10m 10.240.0.10 13833k8s9000 <none> iis-1803-687cdddf9f-5fjcn 1/1 Running 0 6m33s 10.240.0.30 13833k8s9000 <none> iis-1803-687cdddf9f-6dk28 1/1 Running 0 10m 10.240.0.31 13833k8s9000 <none> iis-1803-687cdddf9f-6j8wg 1/1 Running 0 6m33s 10.240.0.11 13833k8s9000 <none> iis-1803-687cdddf9f-8f5kc 1/1 Running 1 6m33s 10.240.0.14 13833k8s9000 <none> iis-1803-687cdddf9f-bkd7n 1/1 Running 0 10m 10.240.0.28 13833k8s9000 <none> iis-1803-687cdddf9f-bth4v 1/1 Running 0 6m33s 10.240.0.23 13833k8s9000 <none> iis-1803-687cdddf9f-csm2x 1/1 Running 0 10m 10.240.0.5 13833k8s9000 <none> iis-1803-687cdddf9f-dtvqp 1/1 Running 1 6m33s 10.240.0.9 13833k8s9000 <none> iis-1803-687cdddf9f-fv9rn 1/1 Running 1 6m33s 10.240.0.20 13833k8s9000 <none> iis-1803-687cdddf9f-gmzcz 1/1 Running 1 6m33s 10.240.0.12 13833k8s9000 <none> iis-1803-687cdddf9f-kzmcf 1/1 Running 0 10m 10.240.0.7 13833k8s9000 <none> iis-1803-687cdddf9f-lltjr 1/1 Running 1 6m33s 10.240.0.13 13833k8s9000 <none> iis-1803-687cdddf9f-lx2vf 1/1 Running 0 10m 10.240.0.26 13833k8s9000 <none> iis-1803-687cdddf9f-nn9pp 1/1 Running 1 6m33s 10.240.0.21 13833k8s9000 <none> iis-1803-687cdddf9f-pjcws 1/1 Running 1 6m33s 10.240.0.22 13833k8s9000 <none> iis-1803-687cdddf9f-q7hsf 1/1 Running 1 6m33s 10.240.0.33 13833k8s9000 <none> iis-1803-687cdddf9f-qn5c7 1/1 Running 0 10m 10.240.0.27 13833k8s9000 <none> iis-1803-687cdddf9f-rt6r5 1/1 Running 1 6m33s 10.240.0.17 13833k8s9000 <none> iis-1803-687cdddf9f-s2jsb 1/1 Running 0 10m 10.240.0.8 13833k8s9000 <none> iis-1803-687cdddf9f-sgwb8 1/1 Running 1 6m33s 10.240.0.25 13833k8s9000 <none> iis-1803-687cdddf9f-x9tpt 1/1 Running 0 10m 10.240.0.29 13833k8s9000 <none> iis-1803-687cdddf9f-xf6x9 1/1 Running 0 6m33s 10.240.0.24 13833k8s9000 <none> iis-1803-687cdddf9f-xwfxg 1/1 Running 0 10m 10.240.0.15 13833k8s9000 <none> iis-1803-687cdddf9f-zf8kv 1/1 Running 1 6m33s 10.240.0.16 13833k8s9000 <none> azureuser@k8s-master-13833463-0:~/john$ curl http://10.240.0.16 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <title>IIS Windows Server</title> <style type="text/css"> <!-- body { color:#000000; background-color:#0072C6; margin:0; } margin-left:auto; margin-right:auto; text-align:center; } a img { border:none; } --> </style> </head> <body> <div id="container"> <a href="http://go.microsoft.com/fwlink/?linkid=66138&clcid=0x409"><img src="iisstart.png" alt="IIS" width="960" height="600" /></a> </div> </body> </html> ``` Then scaling back down: ``` azureuser@k8s-master-13833463-0:~/john$ kubectl scale deploy iis-1803 --replicas=1 deployment.extensions/iis-1803 scaled azureuser@k8s-master-13833463-0:~/john$ ``` Some time later... ``` azureuser@k8s-master-13833463-0:~/john$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-x9tpt 1/1 Running 0 16m 10.240.0.29 13833k8s9000 <none> ``` And scaling back up again ``` azureuser@k8s-master-13833463-0:~/john$ kubectl scale deploy iis-1803 --replicas=25 deployment.extensions/iis-1803 scaled azureuser@k8s-master-13833463-0:~/john$ ``` Some time later... ``` zureuser@k8s-master-13833463-0:~/john$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE iis-1803-687cdddf9f-26p6l 1/1 Running 0 7m 10.240.0.11 13833k8s9000 <none> iis-1803-687cdddf9f-2ktdz 1/1 Running 0 7m 10.240.0.10 13833k8s9000 <none> iis-1803-687cdddf9f-48ggp 1/1 Running 0 7m 10.240.0.5 13833k8s9000 <none> iis-1803-687cdddf9f-4gtmb 1/1 Running 0 7m 10.240.0.26 13833k8s9000 <none> iis-1803-687cdddf9f-4hd72 1/1 Running 0 7m 10.240.0.24 13833k8s9000 <none> iis-1803-687cdddf9f-4pwsq 1/1 Running 1 7m 10.240.0.33 13833k8s9000 <none> iis-1803-687cdddf9f-5kw22 1/1 Running 0 7m 10.240.0.9 13833k8s9000 <none> iis-1803-687cdddf9f-664z7 1/1 Running 1 7m 10.240.0.16 13833k8s9000 <none> iis-1803-687cdddf9f-8swz7 1/1 Running 1 7m1s 10.240.0.7 13833k8s9000 <none> iis-1803-687cdddf9f-9h98r 1/1 Running 1 7m 10.240.0.8 13833k8s9000 <none> iis-1803-687cdddf9f-9h9jd 1/1 Running 1 7m 10.240.0.14 13833k8s9000 <none> iis-1803-687cdddf9f-lftd7 1/1 Running 1 7m 10.240.0.19 13833k8s9000 <none> iis-1803-687cdddf9f-m9knq 1/1 Running 1 7m 10.240.0.31 13833k8s9000 <none> iis-1803-687cdddf9f-mplcc 1/1 Running 1 7m 10.240.0.21 13833k8s9000 <none> iis-1803-687cdddf9f-p7jn2 1/1 Running 0 7m 10.240.0.20 13833k8s9000 <none> iis-1803-687cdddf9f-sml2x 1/1 Running 0 7m1s 10.240.0.13 13833k8s9000 <none> iis-1803-687cdddf9f-tjfws 1/1 Running 0 7m 10.240.0.18 13833k8s9000 <none> iis-1803-687cdddf9f-vxdl4 1/1 Running 0 7m 10.240.0.15 13833k8s9000 <none> iis-1803-687cdddf9f-x26vj 1/1 Running 1 7m1s 10.240.0.30 13833k8s9000 <none> iis-1803-687cdddf9f-x2hll 1/1 Running 1 7m 10.240.0.28 13833k8s9000 <none> iis-1803-687cdddf9f-x9tpt 1/1 Running 0 24m 10.240.0.29 13833k8s9000 <none> iis-1803-687cdddf9f-xg5bm 1/1 Running 1 7m 10.240.0.23 13833k8s9000 <none> iis-1803-687cdddf9f-zkkzm 1/1 Running 1 7m 10.240.0.32 13833k8s9000 <none> iis-1803-687cdddf9f-zqv69 1/1 Running 0 7m 10.240.0.17 13833k8s9000 <none> iis-1803-687cdddf9f-zvzn9 1/1 Running 0 7m 10.240.0.27 13833k8s9000 <none> azureuser@k8s-master-13833463-0:~/john$ ```
Signed-off-by: John Howard jhoward@microsoft.com
Fixes #242
First, the store timeout is woefully low. Bumped to 20 seconds from 2 seconds.
This appears to fix [cni] Timed out on locking store, err:Store is locked. #242 (comment)
IMO, as only test code calls it non-blocked, why even have a block parameter to Lock()?
IMO also, why a timeout at all? They're always fraught with error and machine timing.
Presence of a key should be checked using
raw, ok := hvs.data[key], not the current nil checkedErrKeyNotFound should be returned if the store file does not exist. It shouldn't ignore that error.
Actually now reports if a timeout occurred correctly, along with non-block lock attempt when already locked.
Serial pattern abuse in not always closing the lock file.
Some golang correctness (errors should be lower case)
go build ./... actually passes on Windows now - various compile errors previously.
golang pattern conformance
if err:=<test>; err!=nil {....take the mutex in GetModificationTime. Was not thread safe!
Simplified timeout duration (no need for time.Duration(...))