Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup refactory #78

Merged
merged 10 commits into from
Jul 2, 2020
Merged

Conversation

jborean93
Copy link
Collaborator

SUMMARY

This is a major refactor of the PowerShell setup module. On testing on a test VM, the time taken to run the full set of subsets has gone from 8.5 seconds down to 3.6 seconds.

It changes the following;

Breaking Changes
  • Will now include the IPv6 scope on a link local address for ansible_ip_addresses
    • This was done so we are not omitting any information that might be useful
  • The ansible_processor list will return the index before the name and manufacturer just like the POSIX facts
    • This was done so we align with the behaviour of the Python setup module
    • It is doubtful this value is used in any runtime logic
  • The ansible_date_time.epoch will return the seconds since EPOCH in UTC like the POSIX module
    • This was done so we align with the behaviour of the Python setup module
Features
  • Major refactor to speed up the process, limited the use of WMI where possible
  • Implemented the gather_timeout option to restrict the time taken for an individual fact subset
  • Added ansible_architecture2 to be more like the POSIX fact
Bugfixes
  • Remove the usage of WMI for a standard user to speed up the execution time and remove access denied errors.

Fixes #26
Fixes #29
Fixes #31
Fixes #42

ISSUE TYPE
  • Bugfix Pull Request
  • Feature Pull Request
COMPONENT NAME

setup

@jborean93
Copy link
Collaborator Author

The sanity failures requires a fix in ansible-test due to the unique nature of the setup.ps1 module ansible/ansible#70376.

@jborean93
Copy link
Collaborator Author

@briantist I am planning on merging this in soon before I create the first release of this collection. If you do fine any time it would be great if you could test out the changes here.

@briantist
Copy link
Contributor

@briantist I am planning on merging this in soon before I create the first release of this collection. If you do fine any time it would be great if you could test out the changes here.

I blocked out some time time tomorrow afternoon (US Eastern) to check it out, looks really nice.

The ansible_date_time.epoch will return the seconds since EPOCH in UTC like the POSIX module

This one I already know is going to hit us in a bunch of places; I have to look through them. Some are used as "unique" values so it probably won't matter, but others are being compared to local time output from other things.

In any case I support the breaking change; UTC makes more sense.

What do you think about adding a local_epoch as well? That would give a really straightforward replacement.

@jborean93
Copy link
Collaborator Author

Having a local equivalent sounds fine to me. That was a hairy one where I wasn’t sure if we should really change it. I opted to do the breaking change because:

  • the format of the value didn’t change, unlike the architecture one
  • the source of the value didn’t change, unlike some of the other WMI changes I had to keep because I couldn’t find an alternative source

Copy link
Contributor

@briantist briantist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time difference was less dramatic for me but still faster. Using WinRM:

  • Full Set - Old Plugin: ~12s
  • Full Set - New Plugin: ~11s
  • Single subset - Old: ~9.1s
  • Single subset - New: ~9.0s

With PSRP:

  • Full Set - Old Plugin: ~11.7s
  • Full Set - New Plugin: ~9s
  • Single subset - Old: ~8.1s
  • Single subset - New: ~7.9s

Also of note the new plugin outputs the values where the old didn't. I think that's just a natural consequence of using the new module type. I think that's a positive change, might take some people by surprise though as plays using setup directly will see a lot more output than before with -v+.

Will reiterate the epoch_local would be very helpful and should be a one line change.

plugins/modules/setup.ps1 Show resolved Hide resolved
@jborean93
Copy link
Collaborator Author

Thanks for reviewing, weird that yours takes that long without a big reduction, I was finding it was a lot faster across the board on all my test hosts. I'm probably going to re-add the measuring times for each subsection so potentially we can find what the bottleneck is in your setup.

@jborean93
Copy link
Collaborator Author

I've pushed a change that adds the _measure_subset option, are you able to run this one more time with that set to True? By doing that you get the measure_info facts back that tell us how long each subset took to run.

"measure_info": {
    "all_ipv4_addresses, all_ipv6_addresses": 0.17186759999999998,
    "bios": 0.015571199999999999,
    "date_time": 0.015639,
    "distribution": 0.25006819999999996,
    "env": 0.01594,
    "facter": 0.09347119999999999,
    "interfaces": 0.17158779999999998,
    "local": 0,
    "memory": 0.20316209999999998,
    "platform": 0.4375382,
    "powershell_version": 0,
    "processor": 0.0781236,
    "uptime": 0.0156174,
    "user": 0.0156384,
    "virtual": 0.0156174,
    "windows_domain": 0.0156531,
    "winrm": 0.12498279999999999
},

Hopefully that can potentially tell us what may be running slower on your hosts.

@briantist
Copy link
Contributor

briantist commented Jul 2, 2020

Sorry to add extra work for you! I can tell you that the time taken is probably mostly overhead; none of my ansible connections to Windows hosts take under ~8 seconds for any task. This is probably latency on my connection to the hosts, possibly made worse by using Kerberos as authentication method.

But that looks interesting going to try it out with _measure_subset now.

@jborean93
Copy link
Collaborator Author

No worries, I'm actually in the middle of ripping out the runspace threading, it isn't adding any major benefit and probably adds more overhead that isn't needed.

@briantist
Copy link
Contributor

So here's the data (only using psrp this time):

  • Single subset (date_time only) total task time 7.95s:
"measure_info": {
    "date_time": 0.078124
}
  • Full set 9.3s:
"measure_info": {
    "all_ipv4_addresses, all_ipv6_addresses": 0.2795143,
    "bios": 0.0781275,
    "date_time": 0.1093798,
    "distribution": 0.6718657,
    "env": 0.1422718,
    "facter": 0.0781272,
    "interfaces": 0.2812267,
    "local": 0.0156266,
    "memory": 0.5312337,
    "platform": 1.062492,
    "powershell_version": 0,
    "processor": 0.1718491,
    "uptime": 0.0561348,
    "user": 0.0156268,
    "virtual": 0.015625399999999998,
    "windows_domain": 0.0159954,
    "winrm": 0.38933019999999996
}

Those add up to 3.9s, so quite respectable!

I guess it might be more surprising that the original one doesn't seem so slow in comparison, but we don't have the breakdown I guess.

For comparison, I also ran win_ping against the same host: 7.29s (against 7.77s for single and 9.33s for full setup in this same run).

@briantist
Copy link
Contributor

No worries, I'm actually in the middle of ripping out the runspace threading, it isn't adding any major benefit and probably adds more overhead that isn't needed.

That's a good point, the times are so small individually already. Is the data in measure_info including the runspace overhead or excluding it?

@jborean93
Copy link
Collaborator Author

Excluding, it's basically the time it takes to invoke the fact subset scriptblock. There's a problem with platform taking more than 10 seconds when an AWS vm starts up so I'm trying to solve that. I have a feeling the machine SID stuff is the culprit or maybe the reboot required check but I'm trying to narrow it down a bit.

@briantist
Copy link
Contributor

Interesting, how long after startup? All the machines I'm testing against are in AWS too

@jborean93
Copy link
Collaborator Author

I think it might be more initializing the host when it is first built. I'm only seeing the problem when using ansible-test in CI for 2019. If I was to reboot the host then things are fine, it's just the once which makes things difficult to debug.

@jborean93
Copy link
Collaborator Author

After extensive testing against multiple hosts I think this is ready for prime time. Thanks @briantist for testing out the changes here.

@jborean93 jborean93 merged commit c20a241 into ansible-collections:master Jul 2, 2020
@jborean93 jborean93 deleted the setup-refactory branch July 2, 2020 03:41
@mickaeltr
Copy link

mickaeltr commented Dec 15, 2020

Hello,
We are now seeing the following WARNING on some servers:

TASK [Gathering Facts] *********************************************************
[WARNING]: Error when collecting facter facts: Join-Path : Cannot bind argument to parameter 'Path' because it is an
empty string. At line:4 char:47 + $facterPath = Join-Path -Path $_ -ChildPath facter.ex ... +
~~ + CategoryInfo : InvalidData: (:) [Join-Path], ParameterBindingValidationException +
FullyQualifiedErrorId :
ParameterArgumentValidationErrorEmptyStringNotAllowed,Microsoft.PowerShell.Commands.JoinPathCommand at
, : line 4 at , : line 3

Do you think that could be related to that refactor?
Thanks

@jborean93
Copy link
Collaborator Author

@mickaeltr thanks for sharing the issue, I've opened #160 which fixes the problem there introduced by this change. In the future it would be best to just log a new issue and then link to the PR you believe causes the problem. We don't always look back at old PRs and closed issues so can miss these comments from time to time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment