Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gazelle): create py_binary targets for if __name__ == "__main__" #1566

Closed

Conversation

adzenith
Copy link
Contributor

@adzenith adzenith commented Nov 17, 2023

Previously a py_binary target was only created if a file named __main__.py existed in the package. All other Python files get put into one or more py_library targets. This change makes Gazelle look at each file that would otherwise be added to a py_library target and see if it contains a check for if __name__ == "__main__". If such a check is found in the file, then a py_binary target is created for it instead.

No existing tests fail with this change, though I think in real code it's possible this could be a breaking change for target generation. New tests have been added to check the new functionality.

I used the check for if __name__ == "__main__" because that seems like the most Pythonic way of showing that a script is meant to be run.

Previously a `py_binary` target was only created if a file named
`__main__.py` existed in the package. Now we look at each file that
would otherwise be added to a `py_library` target and see if it
contains a check for `if __name__ == "__main__"`. If such a check is
found in the file, then we create a `py_binary` target for it instead.

No existing tests fail with this change, though I think in real code
it's possible this could change target generation. New tests have been
added to check the new functionality.
// hasNameEqualsMain determines if the file contains 'if __name__ == "__main__"'.
func hasNameEqualsMain(path string) bool {
searchString := `if __name__ == ['"]__main__['"]:`
bytesContents, err := os.ReadFile(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we are reading the python files once to get the imports and here it would be for once more to check if there is an if __name__ == "main" which may not scale well in super large repos because we have now twice the number of files to process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see how the read to get the import works. Maybe I can figure out a way to not read it twice.

if err != nil {
return false
}
match, err := regexp.Match(searchString, bytesContents)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading from the end of the file line by line in reverse and matching the whole line (stripped) could be faster? If you want to still use regexp, you could at least compile the regexp upfront and store it as a var at the top of the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let me look into making this faster.

@@ -463,6 +470,20 @@ func hasEntrypointFile(dir string) bool {
return false
}

// hasNameEqualsMain determines if the file contains 'if __name__ == "__main__"'.
func hasNameEqualsMain(path string) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think that this new feature needs a feature toggle. There should be a gazelle directive that enables the behaviour.

What is more, sometimes developers add if __name__ in order to test the script locally but do not intend to create a binary target for others to consume. I wonder if in those cases we should be able to say in the python file:

if __name__ == "__main__":  # gazelle: ignore
    main()

Copy link
Contributor Author

@adzenith adzenith Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A feature toggle is a great call. Let me look into making that work.

As for the ignore, is there any reason you wouldn't want that to get moved to a py_binary? Then you could bazel run it and test it locally. (If the script is already in a test target, then this change won't move it - it only moves scripts from py_library -> py_binary, not from py_test, because you can already bazel run a test.) I guess I'm just curious when you might want an ignore / might want to keep a script out of a py_binary target.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the ignore part out of scope for now. It probably does not need to be supported in the initial version.

@adzenith
Copy link
Contributor Author

Thanks for the feedback! I'll work on it and put up an improved version.

@aignas aignas closed this in #1584 Dec 13, 2023
github-merge-queue bot pushed a commit that referenced this pull request Dec 13, 2023
Many existing Python repos don't use `__main__.py` to indicate the the
main module. Instead, they put something like below in any Python files:

```python
if __name__ == "__main__":
  main()
```

This PR makes the Gazelle extension able to recognize main modules like
this, when `__main__.py` doesn't exist. This reduces the need to create
`__main__.py` when enabling Gazelle extensions in existing Python repos.

The current behavior of creating single `py_binary` for `__main__.py` is
preserved and takes precedence. So this is a backward-compatible change.

Closes #1566.
@adzenith adzenith deleted the py_binary-if-name-equals-main branch January 2, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants