Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand Session watcher to help auto complete with NSE (tidyverse syntax) #323

Closed
gowerc opened this issue May 14, 2020 · 21 comments · Fixed by #530
Closed

Expand Session watcher to help auto complete with NSE (tidyverse syntax) #323

gowerc opened this issue May 14, 2020 · 21 comments · Fixed by #530

Comments

@gowerc
Copy link
Contributor

gowerc commented May 14, 2020

It would be awesome if the session watchers autocomplete could be expanded to work with non-standard evaluation for example when using dplyr's mutate/select/etc.

I.e. when writing something like:

iris %>%
    select( 

Then have the auto-complete popup with the list of variables within the iris dataset.

@gowerc
Copy link
Contributor Author

gowerc commented May 23, 2020

For reference Rstudios implementation of this simply looks at the top most object for which names to use in the autocomplete, for example

image

As you can see Rstudio just offers all the names of iris in the autocomplete even though none of them exist after the gather() command. Having said that it does have some sense of context as you can see the newly created variable is available in the list. I think even if it was possible to just get the initial object variables listed (ignoring newly created ones) it would be a big help to development !

@gowerc
Copy link
Contributor Author

gowerc commented May 31, 2020

At @andycraig & @renkun-ken ,

Which files are related to the current session watcher autocomplete. For example when session watcher is enabled I am able to do this:

image

But I'm struggling to work out where in the code provides this functionality

@renkun-ken
Copy link
Member

renkun-ken commented May 31, 2020

On mobile now. If I remember correctly, The session watcher completion is implemented in extension.ts: provideCompletionItems, getBracketCompletionItems, etc.

Please take a look at https://github.com/Ikuyadeu/vscode-R/blob/master/src/extension.ts#L224-L316.

@randy3k
Copy link
Member

randy3k commented Jun 3, 2020

@renkun-ken
Copy link
Member

renkun-ken commented Aug 21, 2020

@gowerc I come up with an easy approach to this at https://github.com/renkun-ken/languageserver/tree/token-completion which simply put all symbols that appear in the same document into the completion list, which will be helpful to complete symbols that already appear in the document, including symbols in the pipelines.

image

You might have a try on this any let me know if this is helpful.

@gowerc
Copy link
Contributor Author

gowerc commented Aug 21, 2020

Thanks for looking into this, will try and find some time to have a play !

@renkun-ken
Copy link
Member

To provide completion items in a pipe expression, without a parser, it is a bit trickier to detect the names of the variables to show completions, although we already have the names() of all variables in the global environment in globalenv.json created by the session watcher. I suggest we do the following:

my_dt %>%
  mutate(var3 = test) %>%
  select(var1, var2, var3)

As a starting point, we should detect my_dt in this case, and hopefully var3 in the future.

@danielbasso
Copy link
Contributor

danielbasso commented Jan 20, 2021

My totally abstract idea by now is to dynamically compute names() for the whole pipe chain using only the first row of the dataframe, to make computations fast. Using only the first row shouldn't affect select(), group_by(), summarise() and others. It could affect filter(), but this can be ignored, I guess?

Edit:

Just for curiosity, apparently even with no rows at all dplyr operations function normally. For example:

my_df <- tibble(var1 = c(1,2,3), var2 = c(4,5,6), var3 = c(7,8,9))

my_df %>% 
  filter(var1 == 4) %>% 
  group_by(var2) %>% 
  summarise(var4 = var1 + var2) %>% 
  names()

Correctly returns:

[1] "var2" "var4"

Even with a filter(var1 == 4) that returns no rows at all. Like I said, I have no idea of who feasible is this approach, but I have a felling that this could be 'exploited'.

@renkun-ken
Copy link
Member

My totally abstract idea by now is to dynamically compute names() for the whole pipe chain using only the first row of the dataframe, to make computations fast ...

I'm afraid we cannot execute user code in any way because it might have side effects to user sessions. Imagine user in-place mutate or remove columns or execute some code that has important side effects like

my_df %>%
  filter(var == 4) %>%
  select(path) %>%
  remove_files()

@renkun-ken
Copy link
Member

renkun-ken commented Jan 21, 2021

@gowerc @danielbasso would you like to try PR #530 by installing the artifact in https://github.com/Ikuyadeu/vscode-R/actions/runs/500273775?

@gowerc
Copy link
Contributor Author

gowerc commented Jan 21, 2021

Sure will try and give it a test this weekend !

@renkun-ken
Copy link
Member

Sorry, it is not very useful at the moment since it only works with very limited cases. I'll improve it later.

@danielbasso
Copy link
Contributor

Sure will try and give it a test this weekend!

Same here!

Sorry, it is not very useful at the moment since it only works with very limited cases. I'll improve it later.

No worries man, this is volunteer work, do it at your own pace. I'll try to help whenever I can.

@renkun-ken
Copy link
Member

@gowerc @danielbasso You could try the latest development build and see if it works for you.

@gowerc
Copy link
Contributor Author

gowerc commented Jan 25, 2021

Hey @renkun-ken ,

First off this is awesome, than you for your work on this !!

It wasn't clear to me from the above conversation if inline variables (those created within the pipeline) were supposed to be made available in this update. For me at least the base dataframes variables are available but not those that are defined inline:

EDIT: You can't see it in the screenshot due to the tooltip but I had defined a variable called score which is then not appearing in the drop down box

image

@renkun-ken
Copy link
Member

@gowerc If I'm understanding it correctly, then you want

x <- iris
x %>%
  mutate(score = Petal.Length + Petal.Width) %>%
  filter()

The autocompletion triggered by ( does indeed only show the existing columns as completion items of x in the user session. The score item will appear when you try to type it, and it is provided by languageserver token completion, which is based on code analysis rather than user session globalenv.

image

@gowerc
Copy link
Contributor Author

gowerc commented Jan 26, 2021

Indeed, would be great if it could also be included in the autocompete triggered by ( but absolutely no worries if thats not possible this is already an amazing improvement as is :)

@renkun-ken
Copy link
Member

I filed REditorSupport/languageserver#369 that adds ( to the completion trigger characters in languageserver.

@KatlehoJordan
Copy link

Just starting to try to use R extension in VS Code to replace RStudio. It seems to me that the original ask of this issue has still not been addressed.

If I initialize a new file, then using dplyr pipelines with the pipe operator %>% still does not make the variables of the source dataset visible for autocompletion.

Specifically:

library(dplyr)
iris %>%
   select(

The intellisense that appears after the select( does not give me a list of the variables in the iris dataset, which was the original ask, and is standard behavior in RStudio.

Am I doing something wrong? Or is this a feature that "won't be done", hence this issue being closed?

Thanks for your help!

@renkun-ken
Copy link
Member

Here iris is somewhat special since it is not user-defined but a built-in data, which is not supported yet. If the data is a user-defined one, it should work already.

@KatlehoJordan
Copy link

KatlehoJordan commented Dec 11, 2022

I had a suspicion that if it does not work with iris it would also not work with any dataset where the symbols for the tokens were not already seen by the workspace.

To test, I created a .csv file with a simple dataframe-like structure and used my_data <- read.csv(...) to import it.

I can confirm that you are correct; when I subsequently use my_data %>% select( my variable names are found and suggested.

This is really terrific!

Thank you for this extension and all of its features! 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants