Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: AutoImport missing library #93

Closed
zhou13 opened this issue Apr 6, 2023 · 6 comments
Closed

Feature Request: AutoImport missing library #93

zhou13 opened this issue Apr 6, 2023 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@zhou13
Copy link

zhou13 commented Apr 6, 2023

One of the problems in Python is that you often have to go to the beginning of a file and write code such as import os or import numpy as np when you use a library first time in a file. This is a hassle. Right now, I configure my editor so that it runs tidy-import from pyflyby to automatically add missing imports when I saved the file. I just need to keep a common set of import at ~/.pyflyby. Unfortunately, tidy-import is slow to run (it takes ~0.5s on 10 lines of code).

I think it would be best if ruff can support this feature given its speed. I am not sure if that is within the capacity of ruff. I put this feature request under ruff-lsp as I felt it might be hard to configure the auto-import behavior for ruff tool itself since pyproject.toml are intended for linting.

@charliermarsh charliermarsh added the enhancement New feature or request label Apr 6, 2023
@charliermarsh
Copy link
Member

Very interesting, I've never seen that library before, it's quite clever.

@rchl
Copy link
Contributor

rchl commented Apr 10, 2023

Just some counterpoint: The decision whether to auto-add an import should IMO be rather made at the time of writing the specific code that needs it rather than after the code is written and saved. The reason for that is that it's sometimes ambiguous from where to import.

Servers like pyright are able to show all import alternatives through the completion popup and this is IMO the right place for the user to decide on that matter.

Doing this on save would be inferior and potentially annoying.

@zhou13
Copy link
Author

zhou13 commented Apr 11, 2023

I have been using pyflyby (on-save) and pyright (LSP) together for several months. First I don't see why it is annoying: tidy-import just also fixes some missing imports for you. This is even safer than auto-removing unused imports since it has fewer false positives.

Regarding pyright, I am definitely not satisfied with pyright's auto import feature as it is right now. Maybe it is a bug or some configuration issue. It rarely works for system library. For example, if I type np, it never gives me a choice to import numpy as np. The same is true even for os.*. At least in my VIM, pyright's auto import is only useful for functions inside my own library.

Maybe with a better implementation, pyright-style auto-import will be much more useful. However, I don't think that makes pyflyby-style fixer inferior. For example, it is at least more seamless when you write code since you don't need to think when to interact with the LSP manually and issues are auto-fixed. It will also be more consistent with other ruff's existing auto fixer in term of the UI and style.

@V3RGANz
Copy link

V3RGANz commented May 21, 2023

Pyright deliberately limits its capabilities to favor exclusive tools, focusing predominantly on type-checking microsoft/pyright#4263 (comment)
This is the reason it does not support auto-import as a code action, and has certain shortcomings in its current auto-import tool.
As @zhou13 previously noted, Pyright sometimes fails to suggest accurate imports because, presumably, it is unable to import modules, only symbols from modules. For instance, if I type json, there's no suggestion to auto-import the json module. (UPD: I was wrong about it, this was just bad ranking, as module json was several pages of completions below, but for some reason, os can't be imported indeed)

Additionally, this situation is often faced when you write a line or block of code (or simply paste it) and then try to resolve undefined references using the auto-import code action. In the case of having only auto-import with completion, you need to retype the last character of the unimported symbol so that the completion popup appears.

Although auto-import on save may not be optimal, it's a reasonable code action that offers a popup with suggested choices.

@charliermarsh charliermarsh self-assigned this Jun 16, 2023
@charliermarsh
Copy link
Member

We may return to this, but it's not on the near-time roadmap, so I'm gonna close for now to keep the issue tracker actionable.

@charliermarsh charliermarsh closed this as not planned Won't fix, can't repro, duplicate, stale Jun 16, 2023
@hansalemaos
Copy link

hansalemaos commented Feb 1, 2024

I wrote a little script that might serve as a base

import subprocess
import shutil
import re
import os, sys
import tempfile
from collections import defaultdict
from pathlib import Path


errorcodes = ["F821"]


def touch(path: str) -> bool:
    # touch('f:\\dada\\baba\\caca\\myfile.html')
    # original: https://github.com/andrewp-as-is/touch.py (not working anymore)
    def _fullpath(path):
        return os.path.abspath(os.path.expanduser(path))

    def _mkdir(path):
        path = path.replace("\\", "/")
        if path.find("/") > 0 and not os.path.exists(os.path.dirname(path)):
            os.makedirs(os.path.dirname(path))

    def _utime(path):
        try:
            os.utime(path, None)
        except Exception:
            open(path, "a").close()

    def touch_(path):
        if path:
            path = _fullpath(path)
            _mkdir(path)
            _utime(path)

    try:
        touch_(path)
        return True
    except Exception as Fe:
        print(Fe)
        return False


def get_tmpfile(suffix=".bin"):
    tfp = tempfile.NamedTemporaryFile(delete=False, suffix=suffix)
    filename = tfp.name
    filename = os.path.normpath(filename)
    tfp.close()
    touch(filename)
    return filename


def fix_imports(file):
    if __file__ == file:
        return None
    p = subprocess.run([sys.executable, __file__, file], capture_output=True)
    return p


if __name__ == "__main__":
    # Check if ruff and rg executables are available
    ruffpath = shutil.which("ruff.exe")
    if not ruffpath:
        input(
            "ruff not found! Please install ruff https://github.com/astral-sh/ruff , put it in your path and restart the script"
        )
        sys.exit(1)
    rgpath = shutil.which("rg.exe")
    if not ruffpath:
        input(
            "ripgrep not found! Please install ripgrep https://github.com/BurntSushi/ripgrep , put it in your path and restart the script"
        )
        sys.exit(1)

    # Get the path of the Python interpreter
    interpreter_folder = str(Path(sys.executable).parent)

    # Set the file types and get the target file from command-line arguments
    filetypes = "py,pyx"
    file = sys.argv[1]
    file_path_object = Path(file)

    # Check if the specified file exists
    if not file_path_object.exists():
        print("File {file} does not exist")
        sys.exit(1)

    # Read the content of the target file
    with open(file, mode="r", encoding="utf-8") as f:
        filecontent = f.read()

    # Extract folder information and create temporary and backup filenames
    file_folder = str(file_path_object.parent)
    if file_folder == ".":
        file_folder = os.getcwd()
    purefile = file_path_object.name
    tmp_file = os.path.join(file_folder, "__" + purefile)
    backupfile = os.path.join(file_folder, purefile + ".bak")

    # Create regular expressions for errorcodes (F821 ...)
    errorcodesregex = re.compile("|".join([f"\\b{x}\\b" for x in errorcodes]))
    folders = [interpreter_folder]
    if interpreter_folder not in file_folder:
        folders.append(file_folder)

    p = subprocess.run([ruffpath, file], capture_output=True)
    stdoutrufffirstrun = p.stdout.decode("utf-8", "backslashreplace")
    stdout = stdoutrufffirstrun.splitlines()
    stdoutlist = [
        re.findall(r"""Undefined\s+name\s+`([^`]+)`""", x)
        for x in stdout
        if errorcodesregex.search(x)
    ]

    # Exit if no missing imports are found
    if not stdoutlist:
        sys.exit(0)

    # Collect missing imports
    missingimports = set()

    for x in stdoutlist:
        for y in x:
            missingimports.add(y)

    # Write regular expressions to the temporary file for ripgrep
    regexdict = {}
    regextmpfile = get_tmpfile(suffix=".tmp")
    with open(regextmpfile, mode="w", encoding="utf-8") as f:
        for x in missingimports:
            r1 = rf"^\s*\bimport\b\s+\b{x}\b\s*$"
            r2 = rf"^\s*\bfrom\b\s+[^\s]+\s+\bimport\s+\b{x}\b\s*$"
            r3 = rf"^\s*\bimport\b\s+[^\s]+\s+\bas\b\s+\b{x}\b\s*$"
            r4 = rf"^\s*\bfrom\b\s+[^\s]+\s+\bimport\b\s[^\s]+\s\bas\s+\b{x}\b\s*$"
            f.write(r1)
            f.write("\n")
            f.write(r2)
            f.write("\n")
            f.write(r3)
            f.write("\n")
            f.write(r4)
            f.write("\n")

            # to sort ripgrep results
            regexdict[x] = {
                "r1": re.compile(r1),
                "r2": re.compile(r2),
                "r3": re.compile(r3),
                "r4": re.compile(r4),
            }

    foundimportlinescounter = defaultdict(int)
    foundimportlines = []

    # Search for missing imports in specified folders using ripgrep
    for folder_to_search in folders:
        results = subprocess.run(
            [
                rgpath,
                "-f",
                regextmpfile,
                "-g",
                f"*.{{{filetypes}}}",
                "-o",
                "--no-line-number",
                "--multiline",
                "-I",
                "--trim",
                "--case-sensitive",
                "--color=never",
                "--no-messages",
                "--no-unicode",  # faster, but no special chars
                "--no-ignore",
                "-a",
                "--crlf",
            ],
            capture_output=True,
            cwd=folder_to_search,
            env=os.environ.copy(),
        )

        foundimportlines.extend(
            [
                re.sub(r"\s+", " ", q.strip())
                for q in results.stdout.decode("utf-8", "backslashreplace").splitlines()
            ]
        )

    # Continue processing only if import lines are found
    if not foundimportlines:
        sys.exit(0)

    # Count occurrences of each import line
    for importedline in foundimportlines:
        foundimportlinescounter[importedline] += 1

    # Identify unique import lines
    resultstdout = set(foundimportlines)
    foundpackagesdict = {}

    # Match import lines with regular expressions and organize results
    for resultline in resultstdout:
        for k, v in regexdict.items():
            if k not in foundpackagesdict:
                foundpackagesdict[k] = []
            for regexpr in v.items():
                resultregex = regexpr[1].findall(resultline)
                if resultregex:
                    foundpackagesdict[k].append(resultline)

    foundpackagesdict = {
        k: sorted(set(v), key=len) for k, v in foundpackagesdict.items()
    }

    # Extract the best import line for each package (fewest errors / most imports / shortest line)
    ruff_all_results_dict = {}
    for packagename, packageimportlines in foundpackagesdict.items():
        ruff_all_results = []
        ruff_all_results_dict[packagename] = []
        for packageimportline in packageimportlines:
            with open(tmp_file, mode="w", encoding="utf-8") as f:
                f.write(packageimportline)
                f.write("\n")
                f.write(filecontent)
            ptest = subprocess.run([ruffpath, tmp_file], capture_output=True)
            stdoutrufftestrun = ptest.stdout.decode("utf-8", "backslashreplace")
            ruff_all_results.append(
                [
                    len(stdoutrufftestrun.strip().splitlines()),
                    -foundimportlinescounter.get(packageimportline, -1),
                    len(packageimportline),
                    packagename,
                    packageimportline,
                ]
            )
            ruff_all_results_dict[packagename] = ruff_all_results

        if ruff_all_results_dict[packagename]:
            ruff_all_results_dict[packagename] = sorted(
                ruff_all_results_dict[packagename],
                key=lambda x: x[:3],  # fewest errors / most imports / shortest line
            )[0][-1]

    # Backup the original file content
    with open(backupfile, mode="w", encoding="utf-8") as f:
        f.write(filecontent)

    # Write the modified file with the optimized import lines
    with open(file, mode="w", encoding="utf-8") as f:
        for k, v in ruff_all_results_dict.items():
            if v:
                f.write(v)
                f.write("\n")
        f.write(filecontent)

    # Attempt to remove the temporary file
    try:
        os.remove(tmp_file)
    except Exception as e:
        print(e)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants