Skip to content

Conversation

Ar4l
Copy link
Collaborator

@Ar4l Ar4l commented Mar 15, 2024

This request updates the VSC Plugin and adds new respective APIs for a user study on developer interactions with code completions. The JetBrains plugin and its APIs are unaffected. User study data is stored under a new subdirectory data_aral.

The general motivation behind the modifications proposed in this PR is to more closely resemble the SOTA code-completion tools that continuously generate suggestions; while also preventing the target LLMs from generating suggestions unnecessarily.

Fixes

  • Plugin build configuration for hot-reloading during development (seemed outdated).
  • Extension objects are now disposed of when the plugin is disabled.
  • Fixes rankings in VSCode to show Code4Me completions at the top.
  • Bugfix CodeGPT generation: prefix was trimmed twice, resulting in errors if the cursor is on an empty newline. This was likely a source of memory leaks as the resulting tensors are created on the GPU and not disposed of.
  • Debounces automatic invocations to when the user has stopped typing (and labels them as 'auto' instead of manual).
  • Stored JSON fields to follow Pythonic snake_case convention, as they are going to be used in data analysis anyway.
  • language field determined using vsc API instead of file extension.
  • splitTextAtCursor was flaky like 20% of the time, placing one fewer character in the prefix because it was using global position & document (which may not be perfectly synchronised with when the provideCompletionItems method is called).
  • General prefix & suffix character matching to fit the completion in properly with the surrounding code. This was the cause of (1) hidden/deprioritised completions and (2) very annoying additional brackets being placed. Instead of hard-coded rules, it now uses a general algorithm.

Adds

  • Tracks manual invocations.

  • New idle invocation type, after the user has not interacted for 2s. Completions are not shown if none are received (as opposed to IntelliSense's default No Completions indicator).

  • If a completion is generated while the user is still typing, we try to match the last few characters with the completion so they are not duplicated.

  • Stores time a suggestion is displayed, and accepted.

  • Explicitly receives and sends the models, so the model that generated an accepted completion can be retrieved deterministically even if the completion is the same as another model. Also shows the model to the user when they press ⌃Space again.

  • The context for completions is always stored in the user study. ReadMe is updated to reflect that developers who use the tool, agree to these terms.

  • Server-side filter for rejecting completions likely to be ignored or useless to the user. The user is assigned one of four filters for a session (where two completions are no more than 30 minutes apart). The filters are:

    1. A simple logistic-regression model leveraging telemetry data.
    2. A CodeBERTa model fine-tuned on the code context surrounding the cursor.
    3. Two JonBERTa models (custom architecture) leveraging both telemetry and code context.
  • Option for testing deployment locally. Run flask app with env variables CODE4ME_TEST=true and CODEGPT_CHECKPOINT_PATH set to a model from HF (e.g. 'microsoft/CodeGPT-small-py').

Todos

  • Server-side

    • Disabled survey check at code4me-server/api.py > autocomplete because it requires globbing a directory with >1M files on every completion request. Consider sorting completions under user-hash subdirectories as a more scalable approach.
    • Store model confidence per a given completion.
    • Generate multiple predictions per model, ranking them client side based on similarity to the current context.
    • Figure out & fix the remaining VRAM memory leaks.
    • Avoid saving empty JSON files.
    • Add additional language support for filters.
    • Stream pipelines using HF datasets generator.
    • Update models to newer code-completion models.
  • Client-side (vsc)

    • Found bug in vsc ground-truth-tracking. MRE: Call completion on line 3, then delete line 3; results in tracking lineNumber=1.
    • Collect ground truth at additional intervals (1, 2, 5, 10, 30 minutes)
    • Caching to prevent re-generating completions when the user deletes something.
    • Support for multi-line ghost-text-style completions.
    • Potentially debounce automatic invocations to help users not surpass their rate-limit.

@Ar4l Ar4l force-pushed the aral_user_study branch from 32570c0 to db62a75 Compare March 17, 2024 15:52
@Ar4l Ar4l marked this pull request as ready for review March 17, 2024 16:25
@Ar4l Ar4l requested a review from FrankHeijden March 17, 2024 16:29
@Ar4l Ar4l force-pushed the aral_user_study branch from d233d8a to 1b0af53 Compare March 18, 2024 11:48
@Ar4l Ar4l requested a review from FrankHeijden March 18, 2024 11:57
@FrankHeijden FrankHeijden merged commit 620bfa3 into main Mar 18, 2024
@FrankHeijden FrankHeijden deleted the aral_user_study branch March 18, 2024 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants