Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve type completion #16875

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

MartinGC94
Copy link
Contributor

@MartinGC94 MartinGC94 commented Feb 11, 2022

PR Summary

Fixes #16798
Fixes #3277
Fixes #14146
Fixes #7287

Fixes the issue where namespaces would be removed from typenames with no regard for type name conflicts:

using namespace System.Threading
using namespace System.Timers
[Timer<Tab>

Now it will tab complete to the shortest non ambiguous name, in this case it would just be the full name but if a namespace alias or "using system" had been defined it would use that instead. Note it won't fall back to type accelerators or type aliases even if they would have been shorter. IMO It would be confusing if the entered type name changed, for example "System.Sing" turning info "float" and it would be annoying not being able to tab complete the real name if you for whatever reason wanted to use it.

Fixes the issue where using statements in $profile had no effect on the tab completion.
This is done by first checking the script text for using statements and if none are found, fall back to the session state. The idea here is that if you are writing a script inside an editor the using statements there should have priority over whatever is in your session state.

Fixes the issue where powershell defined types are only considered for tab completion if they are defined in the script text.
Now it will also include types from using module statements in the tab completion and types loaded into the session state.

Adds the ability to find types/namespaces based on subnamespaces.
For example you can type in Generic.List<Tab> to tab complete to System.Collections.Generic.List instead of either having to type the full name, or just list and select it from a long list of names.

Fixes the issue where quotes around using namespace statements aren't considered when removing the namespace from completion, for example:

using namespace "System.Management"
[token<Tab>

currently completes to [tomation.Language.Token] instead of the proper [Automation.Language.Token]

Limit the type completion suggestions to Attribute types when completing an attribute token, for example:

[Parameter<Tab>(
# Results in 1 result "Parameter" because the "(" makes the parser see it as an attribute
[Parameter<Tab>
# Results in 22 results

Increases and decreases the type completion performance a bit
Using the following snippet to test: Measure-Command { TabExpansion2 [s} | select Milliseconds
on fresh console launches it takes on average 177ms to complete. In my normal pwsh install it takes an average of 164.2ms.
Tab completing after the type cache has been built averages about 5,46ms with my new code and 9ms with my normal pwsh install. Interestingly there's a lot more variance in my code VS the old code. The old code sits consistently on 9ms while mine fluctuates between 4 and 6.
These results make sense to me because my type cache does more work in advance and saves more data but maybe some C# genius can optimize it so it's always faster?

The code was designed and tested around the type/namespace alias support that is currently waiting to get reviewed and hopefully merged here: #16734 the code specific to that change has been commented out, depending on what gets merged first it may need to get uncommented in a different PR.

On a side note, testing this code has made this PSReadline issue: PowerShell/PSReadLine#722 much more apparent to me because there's so many completion items with the same ListItemText for example, tab completing language gets me Microsoft.PowerShell.Commands.Language instead of my desired System.Management.Automation.Language. It should really look at the CompletionText instead, at least for types/namespaces since "apparent" duplicates are a lot more common here.

PR Context

PR Checklist

@iSazonov
Copy link
Collaborator

@MartinGC94 Is it possible to split the PR on some more simple and small to speed up the code review?

@MartinGC94
Copy link
Contributor Author

@iSazonov Not in a meaningful way IMO. The code is basically doing this:

  1. Extract interesting information from types to the type cache
  2. Build the typecache with this information
  3. Analyze script/session state for using statements and PS type definitions
  4. Use the type cache, using information and input to find the best/shortest completion names
  5. Create the completions

Aside from step 3, every step relies on the previous steps so a PR for step 2 would either not compile, or require the code from step 1 to be included which would just end up making step 5 look exactly like this PR. It would probably also be hard to judge the cache design on its own without seeing how it gets used.

Maybe a deeper explanation of what I've done and the design will help:
Practically all of the code in the types region has been deleted, so you can just ignore all the line removal/changes and just look at it as if that whole section was written from scratch.
Here's an example of what the type cache looks like:

TypeCache
├───TypeNameMap
│   ├───A
│   │   ├───AutolinkInline
│   │   │    └───Markdig.Syntax.Inlines.AutolinkInline
│   │   └───AutolinkInlineRenderer
│   │       ├───Markdig.Renderers.Normalize.Inlines.AutolinkInlineRenderer
│   │       └───Markdig.Renderers.Html.Inlines.AutolinkInlineRenderer
│   └───B
│       ├───BlankLineBlock
│       │    └───Markdig.Syntax.BlankLineBlock
│       └───Block
│           └───Markdig.Syntax.Block
├───NamespaceMap
│   └───A
│       ├───AutoLinks
│       │    └───Markdig.Extensions.AutoLinks
│       └───AutoIdentifiers
│           └───Markdig.Extensions.AutoIdentifiers
├───NamespaceContent
│   └───System.Management.Automation
│       ├───Namespaces
│       │    ├───System.Management.Automation.Tracing
│       │    └───System.Management.Automation.Security
│       └───Typenames
│           ├───PowerShellAssemblyLoadContextInitializer
│           └───PowerShellUnsafeAssemblyLoad
├───CompletionInfo
│   └───Markdig.IMarkdownExtension
│       ├───ListItemText
│       │   └───IMarkdownExtension
│       └───ToolTip
│           └───Markdig.IMarkdownExtension
├───TypeAcceleratorMap
│   └───Alias
│       ├───FullName
│       │   └───System.Management.Automation.AliasAttribute
│       ├───ListItemText
│       │   └───AliasAttribute
│       └───ToolTip
│           └───Class System.Management.Automation.AliasAttribute
└───RootNamespaces
    ├───Markdig
    ├───Microsoft
    └───System

TypeNameMap groups short type names by the first character in the short type name, then the short name itself and finally all the full names with that same type name.
NamespaceMap works just like the typename map, except instead of the short type name it uses the last sub namespace as the key.
NamespaceContent uses full namespace names as the key to find types/namespaces directly under that namespace.
CompletionInfo uses the full type names to find the static completion info (ListItemText and ToolTip)
TypeAcceleratorMap is similar to CompletionInfo except it uses the Type Accelerator name as the key and it also includes the full name. As I'm writing this I'm realizing an optimization here could be to just store the full name and then use the fullname to look up the completion info from the CompletionInfo.
RootNamespaces just contains a hashset of the root namespaces like "System".

The basic idea is to quickly be able to look up necessary data when I have a full name and to limit the scope of searches when I only have a partial name.

Here's a quick summary of the order I would review the methods in and what they do:
1: InitializeTypeCache - Finds types from 3 different sources, hands them off to a different method to extract interesting type info and adds them to the cache.
2: GetTypeInfoForCompletion - Extracts interesting type info using either the type or the full type name. The interesting info is the shortname, fullname, namespace, tooltip and list item text.
3: AddTypeToCache - Uses the type and namespace info from before to create and populate the different tables in the cache
4: GetUsingInfo - Finds using statements in the script or the session state and finds type definitions in the script text or modules specified with using module
5: CompleteType Uses the input to search the collected "using" info, type definitions and type cache for completions, utilizing GetShortestNonConflictingTypeName to well, get the shortest non conflicting type name for each found type.
6: CompleteNamespace I literally copy pasted CompleteType and removed all the type specific stuff so you've practically already reviewed this at this point.

@iSazonov
Copy link
Collaborator

@MartinGC94 Thanks for depth explanation!

Does InitializeTypeCache start in startup time? My ask is about makes sense prepopulate the cache in build time?

@MartinGC94
Copy link
Contributor Author

@iSazonov when you attempt to complete a namespace or type the cache is initialized if the cache is null. The cache is invalidated AKA changed to null whenever a new assembly is loaded.
I like the idea of having a factory default cache that is built at compile time that includes all the types that are always available to PowerShell. I don't know how I would implement that though and we would still need a dynamic cache to handle new types from external sources.

@iSazonov
Copy link
Collaborator

@MartinGC94 My main concern is whether the cache initialized in non-interactive scenarios like schedule task and at startup time in other scenarios - all these could be cause noticeable delay. If yes it makes sense to prepopulate the cache otherwise it is initialized at first tab-completion and I guess users will see no delay really and we have no need to complicate.

@MartinGC94
Copy link
Contributor Author

MartinGC94 commented Feb 14, 2022

@iSazonov Non-interactive scenarios are not affected. As mentioned, it's only initialized when tab completing a namespace/type for the first time, which is the same way it works today.
I still see value in creating a portion of the cache in advance because a delay of almost 200ms is noticeable, and this is on a good computer. I've seen systems where the first type completion took a few seconds to complete.
This is probably not the right PR to introduce this precompiled cache however, since it's already pretty big.

@iSazonov
Copy link
Collaborator

since it's already pretty big

Since you rebuild all type cache code it already makes no sense :-)

I suggest you consider to create Source Generator. You can use follow examples:

Also I suggest you consider to have two type cache - (1) prepopulated with SG, (2) created at run time for new loaded dll-s.

@ghost ghost added the Review - Needed The PR is being reviewed label Feb 21, 2022
@ghost
Copy link

ghost commented Feb 21, 2022

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

@MartinGC94
Copy link
Contributor Author

I've thought of something, currently type accelerators use the resolved shortname as the list item text, for example float gets Single as its list item text but wouldn't it make more sense for it to use the type accelerator name? I mean if you are typing in "Float" and you get an IntelliSense menu you would naturally look for "Float" and not "Single", right?

@iSazonov
Copy link
Collaborator

iSazonov commented Mar 5, 2022

This seems like overcomplicating.

@ghost ghost removed the Review - Needed The PR is being reviewed label Mar 5, 2022
@MartinGC94
Copy link
Contributor Author

@iSazonov What do you mean overcomplicating? Assuming it's in response to my last comment, it's just a matter of changing a single line to use a different property when building the completion result.

@iSazonov
Copy link
Collaborator

iSazonov commented Mar 7, 2022

Assuming it's in response to my last comment, it's just a matter of changing a single line to use a different property when building the completion result.

Yes, it is for your comment. Personally I don't like this - looks weird and mislead me.

@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented Mar 7, 2022

I agree that it would have made sense to have ListItemText match the CompletionText. That's always been strange to me. The tooltip still shows the underlying type anyway.

That said, it will create confusion in some places. Like if you go to complete [Parameter you'll get System.Reflection.Metadata.Parameter and ParameterAttribute with the same list item text. That's sort of it's own issue, because every time I go to do it I forget I need to look for ParameterAttribute because I don't want it to complete as that. But anyway worth noting as a potential downside.

Edit: And I see you point out the issue that would create in PSRL as well where it would only ever complete SRM.Parameter due to duplicate list item texts :/

@ghost ghost added the Review - Needed The PR is being reviewed label Mar 16, 2022
@ghost
Copy link

ghost commented Mar 16, 2022

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

@pull-request-quantifier-deprecated

This PR has 799 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Extra Large
Size       : +521 -278
Percentile : 93.3%

Total files changed: 6

Change summary by file extension:
.cs : +460 -271
.ps1 : +61 -7

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

@TravisEz13 TravisEz13 added the WG-Engine core PowerShell engine, interpreter, and runtime label May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Extra Large Review - Needed The PR is being reviewed WG-Engine core PowerShell engine, interpreter, and runtime
Projects
None yet
4 participants