Skip to content

Commit

Permalink
add: better support for converter tool on windows
Browse files Browse the repository at this point in the history
  • Loading branch information
saemideluxe committed Nov 13, 2022
1 parent 68bf97d commit 0c0a8dc
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 6 deletions.
19 changes: 14 additions & 5 deletions htmlgenerator/contrib/convertfromhtml.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import codecs
import os

import black # type: ignore
from bs4 import BeautifulSoup, Comment, Doctype, NavigableString, Tag # type: ignore
Expand Down Expand Up @@ -122,10 +123,12 @@ def parsehtml(html, formatting, compact):
html = hg.BaseElement(""",
]

soup = BeautifulSoup(
html,
"lxml",
)
if os.name == "nt":
parser = "html.parser"
else:
parser = "lxml"

soup = BeautifulSoup(html, parser)
for subtag in soup.contents:
tags = convert(subtag, 1, compact)
if tags:
Expand All @@ -146,6 +149,7 @@ def main():

formatflag = "--no-formatting"
compactflag = "--compact"
encodingflag = "--encoding"

files = sys.argv[1:]
formatting = formatflag not in files
Expand All @@ -154,10 +158,15 @@ def main():
files.remove(formatflag)
if compactflag in files:
files.remove(compactflag)
if encodingflag in files:
encoding = files[files.index(encodingflag) + 1]
files.remove(encodingflag)
files.remove(encoding)

if not files:
print(parsehtml(sys.stdin.read(), formatting, compact), end="")
for _file in files:
with open(_file) as rf:
with open(_file, encoding=encoding) as rf:
with open(_file + ".py", "w") as wf:
wf.write(parsehtml(rf.read(), formatting, compact))

Expand Down
4 changes: 3 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,9 @@
packages=find_packages(),
zip_safe=False,
include_package_data=True,
extras_require={"all": ["black", "beautifulsoup4", "lxml"]},
extras_require={
"all": ["black", "beautifulsoup4", "lxml;platform_system!='Windows'"]
},
entry_points={
"console_scripts": [
"convertfromhtml = htmlgenerator.contrib.convertfromhtml:main",
Expand Down

5 comments on commit 0c0a8dc

@Eiltherune
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad my edits were helpful! However, in convertfromhtml.py you left out the initialized value of _encoding, which is needed in case there is no encodingflag set. Just add the line _encoding = 'utf8' on line 153, and change line 169 to say encoding=_encoding, and everything should work properly.
Also, you may want to update README.md to include this update to the conversion tool, in case other Windows users need to make use of it.

@Eiltherune
Copy link

@Eiltherune Eiltherune commented on 0c0a8dc Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the extras_require in setup.py needs to either be changed to install_require={"all":... (as the dependencies are mandatory for all portions of htmlgenerator to function properly), or to extras_require={"converter":... to only install the dependencies if the user plans to use convertfromhtml. I believe the "proper" solution is to use converter, and then add a note in README.md stating that the command-line tool needs to be installed by using pip install "htmlgenerator[converter]".

@saemideluxe
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad my edits were helpful! However, in convertfromhtml.py you left out the initialized value of _encoding, which is needed in case there is no encodingflag set. Just add the line _encoding = 'utf8' on line 153, and change line 169 to say encoding=_encoding, and everything should work properly. Also, you may want to update README.md to include this update to the conversion tool, in case other Windows users need to make use of it.

Ups, that went missing (I only have tests for the html-parser, but none for the parsing of the commandline flags). Fixed it, but I initialize it with the default encoding of the system, which mirrors the original behaviour when not passing the flag. And actually I am not sure if we should even consider the encoding parameter when writing the file. In your case you got a "cp1252" encoded file as output, didn't you?

README.md is updated too now.

@saemideluxe
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the extras_require in setup.py needs to either be changed to install_require={"all":... (as the dependencies are mandatory for all portions of htmlgenerator to function properly), or to extras_require={"converter":... to only install the dependencies if the user plans to use convertfromhtml. I believe the "proper" solution is to use converter, and then add a note in README.md stating that the command-line tool needs to be installed by using pip install "htmlgenerator[converter]".

I am not aware that install_require can use a dictionary, how do you mean that?

Regarding the "converter" extra requirement, I will keep it with "all". I prefer "all" because it is a very common extra dependency name in Python packages to make sure every possibly necessary dependency is installed.

@Eiltherune
Copy link

@Eiltherune Eiltherune commented on 0c0a8dc Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually I am not sure if we should even consider the encoding parameter when writing the file. In your case you got a "cp1252" encoded file as output, didn't you?

Yes, running locale.getpreferredencoding() returns cp1252 on my Windows machines and utf8 on my Mac. You're right that encoding parameter is not needed for this example, as I didn't realize that cp1252 and utf8 are compatible, but there may be cases where the encoding flag is still necessary.

I am not aware that install_require can use a dictionary, how do you mean that?

That was a mistake on my part, it should have been install_require=["black", "beautifulsoup4", "lxml;platform_system!='Windows'"]. I believe that extras_require={"all"... is the same as install_requires=, except it requires the user to install the package as pip install "htmlgenerator[all]", which would need to be mentioned in readme. The only reason I would say to use the extras dictionary is so that applications built using htmlgenerator don't automatically bundle the dependencies, as the converter tool would not be utilized.

Please sign in to comment.